TY - JOUR
T1 - Performance of a Code Migration for the Simulation of Supersonic Ejector Flow to SMP, MIC, and GPU Using OpenMP, OpenMP+LEO, and OpenACC Directives
AU - Couder-Castañeda, C.
AU - Barrios-Piña, H.
AU - Gitler, I.
AU - Arroyo, M.
N1 - Publisher Copyright:
© 2015 C. Couder-Castañeda et al.
PY - 2015
Y1 - 2015
N2 - A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. This research follows well-proven strategies in order to obtain the best performance in both OpenMP and OpenACC. OpenMP has become the programming standard for scientific multicore software and OpenACC is one true alternative for graphics accelerators without the need of programming low level kernels. The strategies using OpenMP are oriented towards reducing the creation of parallel regions, tasks creation to handle boundary conditions, and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-core Xeon CPU, Xeon Phi 5110p, and Tesla C2070, obtaining the best performance from the latter. The Tesla C2070 presented an acceleration factor of 9.86X, 1.6X, and 4.5X compared against the serial version on CPU, 12-core Xeon CPU, and Xeon Phi, respectively.
AB - A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. This research follows well-proven strategies in order to obtain the best performance in both OpenMP and OpenACC. OpenMP has become the programming standard for scientific multicore software and OpenACC is one true alternative for graphics accelerators without the need of programming low level kernels. The strategies using OpenMP are oriented towards reducing the creation of parallel regions, tasks creation to handle boundary conditions, and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-core Xeon CPU, Xeon Phi 5110p, and Tesla C2070, obtaining the best performance from the latter. The Tesla C2070 presented an acceleration factor of 9.86X, 1.6X, and 4.5X compared against the serial version on CPU, 12-core Xeon CPU, and Xeon Phi, respectively.
UR - http://www.scopus.com/inward/record.url?scp=84938630387&partnerID=8YFLogxK
U2 - 10.1155/2015/739107
DO - 10.1155/2015/739107
M3 - Artículo
SN - 1058-9244
VL - 2015
JO - Scientific Programming
JF - Scientific Programming
M1 - 739107
ER -