TY - GEN
T1 - Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems
AU - Vincent, Jonathan
AU - Gong, Jing
AU - Karp, Martin
AU - Peplinski, Adam
AU - Jansson, Niclas
AU - Podobas, Artur
AU - Jocksch, Andreas
AU - Yao, Jie
AU - Hussain, Fazle
AU - Markidis, Stefano
AU - Karlsson, Matts
AU - Pleiter, Dirk
AU - Laure, Erwin
AU - Schlatter, Philipp
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/1/7
Y1 - 2022/1/7
N2 - We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers Reτ = 360 and Reτ = 550, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for Reτ = 550 case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about 2000 - 5000 elements per rank; compared to about 50 - 100 for a CPU-rank.
AB - We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers Reτ = 360 and Reτ = 550, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for Reτ = 550 case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about 2000 - 5000 elements per rank; compared to about 50 - 100 for a CPU-rank.
KW - Benchmarking
KW - Computational Fluid Dynamics
KW - Nek5000
KW - OpenACC
KW - Scaling
UR - http://www.scopus.com/inward/record.url?scp=85122621284&partnerID=8YFLogxK
U2 - 10.1145/3492805.3492818
DO - 10.1145/3492805.3492818
M3 - Conference contribution
AN - SCOPUS:85122621284
T3 - ACM International Conference Proceeding Series
SP - 94
EP - 102
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2022
PB - Association for Computing Machinery
T2 - 5th International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2022
Y2 - 12 January 2022 through 14 January 2022
ER -