TY - JOUR
T1 - Solving Electromagnetic Scattering Problems With Tens of Billions of Unknowns Using GPU Accelerated Massively Parallel MLFMA
AU - He, Wei Jia
AU - Yang, Zeng
AU - Huang, Xiao Wei
AU - Wang, Wu
AU - Yang, Ming Lin
AU - Sheng, Xin Qing
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - In this article, a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on graphics processing unit (GPU) heterogeneous platform, noted as GPU-PMLFMA, is presented for solving extremely large electromagnetic scattering problems involving tens of billions of unknowns, In this approach, the flexible and efficient ternary partitioning scheme is employed at first to partition the MLFMA octree among message-passing interface (MPI) processes. Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation, and so on are accelerated by using the GPU. Different parallelization strategies in coincidence with the ternary parallel MLFMA approach are designed for GPU to ensure high computational throughput. Special memory usage strategy is designed to improve computational efficiency and benefit data reusing. The CPU/GPU asynchronous computing pattern is designed with the OpenMP and compute unified device architecture (CUDA), respectively, for accelerating the CPU and GPU execution parts and computation time overlapped. GPU architecture-based optimization strategies are implemented to further improve the computational efficiency. Numerical results demonstrate that the proposed GPU-PMLFMA can achieve over three times speedup, compared with the eight-threaded conventional PMLFMA. Solutions of scattering by electrically large and complicated objects with about 24 000 wavelengths and over 41.8 billion unknowns are presented.
AB - In this article, a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on graphics processing unit (GPU) heterogeneous platform, noted as GPU-PMLFMA, is presented for solving extremely large electromagnetic scattering problems involving tens of billions of unknowns, In this approach, the flexible and efficient ternary partitioning scheme is employed at first to partition the MLFMA octree among message-passing interface (MPI) processes. Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation, and so on are accelerated by using the GPU. Different parallelization strategies in coincidence with the ternary parallel MLFMA approach are designed for GPU to ensure high computational throughput. Special memory usage strategy is designed to improve computational efficiency and benefit data reusing. The CPU/GPU asynchronous computing pattern is designed with the OpenMP and compute unified device architecture (CUDA), respectively, for accelerating the CPU and GPU execution parts and computation time overlapped. GPU architecture-based optimization strategies are implemented to further improve the computational efficiency. Numerical results demonstrate that the proposed GPU-PMLFMA can achieve over three times speedup, compared with the eight-threaded conventional PMLFMA. Solutions of scattering by electrically large and complicated objects with about 24 000 wavelengths and over 41.8 billion unknowns are presented.
KW - Compute unified device architecture (CUDA)
KW - OpenMP
KW - extremely large-scale problems
KW - message-passing interface (MPI) parallelization
KW - multilevel fast multipole algorithm (MLFMA)
KW - scattering problems
UR - http://www.scopus.com/inward/record.url?scp=85127527174&partnerID=8YFLogxK
U2 - 10.1109/TAP.2022.3161520
DO - 10.1109/TAP.2022.3161520
M3 - Article
AN - SCOPUS:85127527174
SN - 0018-926X
VL - 70
SP - 5672
EP - 5682
JO - IEEE Transactions on Antennas and Propagation
JF - IEEE Transactions on Antennas and Propagation
IS - 7
ER -