A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields

Maricela Arroyo, Carlos Couder-Castañeda, Alfredo Trujillo-Alcantara, Israel Enrique Herrera-Diaz, Nain Vera-Chavez

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

© 2015 Maricela Arroyo et al. With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes.This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.
Original languageAmerican English
JournalScientific Programming
DOIs
StatePublished - 1 Jan 2015

Fingerprint

Cache memory
Tensors
Program processors
Derivatives
Controllers
Coprocessor
Processing

Cite this

Arroyo, Maricela ; Couder-Castañeda, Carlos ; Trujillo-Alcantara, Alfredo ; Herrera-Diaz, Israel Enrique ; Vera-Chavez, Nain. / A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields. In: Scientific Programming. 2015.
@article{9414f5755ee849dcb7ba028fbebc5125,
title = "A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields",
abstract = "{\circledC} 2015 Maricela Arroyo et al. With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes.This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.",
author = "Maricela Arroyo and Carlos Couder-Casta{\~n}eda and Alfredo Trujillo-Alcantara and Herrera-Diaz, {Israel Enrique} and Nain Vera-Chavez",
year = "2015",
month = "1",
day = "1",
doi = "10.1155/2015/316012",
language = "American English",
journal = "Scientific Programming",
issn = "1058-9244",
publisher = "Hindawi Limited",

}

A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields. / Arroyo, Maricela; Couder-Castañeda, Carlos; Trujillo-Alcantara, Alfredo; Herrera-Diaz, Israel Enrique; Vera-Chavez, Nain.

In: Scientific Programming, 01.01.2015.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields

AU - Arroyo, Maricela

AU - Couder-Castañeda, Carlos

AU - Trujillo-Alcantara, Alfredo

AU - Herrera-Diaz, Israel Enrique

AU - Vera-Chavez, Nain

PY - 2015/1/1

Y1 - 2015/1/1

N2 - © 2015 Maricela Arroyo et al. With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes.This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.

AB - © 2015 Maricela Arroyo et al. With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes.This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.

U2 - 10.1155/2015/316012

DO - 10.1155/2015/316012

M3 - Article

JO - Scientific Programming

JF - Scientific Programming

SN - 1058-9244

ER -