TY - JOUR
T1 - A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields
AU - Arroyo, Maricela
AU - Couder-Castañeda, Carlos
AU - Trujillo-Alcantara, Alfredo
AU - Herrera-Diaz, Israel Enrique
AU - Vera-Chavez, Nain
N1 - Publisher Copyright:
© 2015 Maricela Arroyo et al.
PY - 2015
Y1 - 2015
N2 - With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes.This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.
AB - With at least 60 processing cores, the Xeon-Phi coprocessor is a truly multicore architecture, which consists of an interconnection speed among cores of 240GB/s, two levels of cache memory, a theoretical performance of 1.01 Tflops, and programming flexibility, all making the Xeon-Phi an excellent coprocessor for parallelizing applications that seek to reduce computational times. The objective of this work is to migrate a geophysical application designed to directly calculate the gravimetric tensor components and their derivatives and in this way research the performance of one and two Xeon-Phi coprocessors integrated on the same node and distributed in various nodes.This application allows the analysis of the design factors that drive good performance and compare the results against a conventional multicore CPU. This research shows an efficient strategy based on nested parallelism using OpenMP, a design that in its outer structure acts as a controller of interconnected Xeon-Phi coprocessors while its interior is used for parallelyzing the loops. MPI is subsequently used to reduce the information among the nodes of the cluster.
UR - http://www.scopus.com/inward/record.url?scp=84936806423&partnerID=8YFLogxK
U2 - 10.1155/2015/316012
DO - 10.1155/2015/316012
M3 - Artículo
SN - 1058-9244
VL - 2015
JO - Scientific Programming
JF - Scientific Programming
M1 - 316012
ER -