TY - JOUR
T1 - High-Performance Evaluation of the Interpolations and Anterpolations in the GPU-Accelerated Massively Parallel MLFMA
AU - He, Wei Jia
AU - Yang, Zeng
AU - Huang, Xiao Wei
AU - Wang, Wu
AU - Yang, Ming Lin
AU - Sheng, Xin Qing
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - This communication investigates high-performance computation schemes for local Lagrange interpolation and anterpolation operations in the parallel graphics processing unit (GPU)-accelerated distributed-memory multilevel fast multipole algorithm (MLFMA). Two ELLPACK format-based schemes, namely, block ELLPACK (ELL-B) and hybrid compressed sparse column (CSC)-ELL-B (CSC-ELL-B), are proposed for the evaluation of interpolation and anterpolation operations, respectively, which ensure high computational throughput for GPU calculation. Optimization using the GPU hierarchical memory architecture, the mechanism of the stream, and the central processing unit (CPU)/GPU asynchronous computation pattern are employed to further improve the overall performance. The proposed schemes are proven to be an order of magnitude faster than the conventional schemes for aggregation/disaggregation operations. For an aircraft model involving over 10 billion unknowns, the iteration time is reduced by over half, which is remarkable progress in the development of GPU-accelerated parallelization of MLFMA.
AB - This communication investigates high-performance computation schemes for local Lagrange interpolation and anterpolation operations in the parallel graphics processing unit (GPU)-accelerated distributed-memory multilevel fast multipole algorithm (MLFMA). Two ELLPACK format-based schemes, namely, block ELLPACK (ELL-B) and hybrid compressed sparse column (CSC)-ELL-B (CSC-ELL-B), are proposed for the evaluation of interpolation and anterpolation operations, respectively, which ensure high computational throughput for GPU calculation. Optimization using the GPU hierarchical memory architecture, the mechanism of the stream, and the central processing unit (CPU)/GPU asynchronous computation pattern are employed to further improve the overall performance. The proposed schemes are proven to be an order of magnitude faster than the conventional schemes for aggregation/disaggregation operations. For an aircraft model involving over 10 billion unknowns, the iteration time is reduced by over half, which is remarkable progress in the development of GPU-accelerated parallelization of MLFMA.
KW - Graphics processing unit (GPU)
KW - large-scale electromagnetic scattering
KW - multilevel fast multipole algorithm (MLFMA)
KW - parallel
UR - http://www.scopus.com/inward/record.url?scp=85159646134&partnerID=8YFLogxK
U2 - 10.1109/TAP.2023.3269106
DO - 10.1109/TAP.2023.3269106
M3 - Article
AN - SCOPUS:85159646134
SN - 0018-926X
VL - 71
SP - 6231
EP - 6236
JO - IEEE Transactions on Antennas and Propagation
JF - IEEE Transactions on Antennas and Propagation
IS - 7
ER -