Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Xin Duo Liu; Wei Jia He; Ming Lin Yang; Xin Qing Sheng

doi:10.1007/s11227-023-05759-2

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Xin Duo Liu, Wei Jia He^*, Ming Lin Yang, Xin Qing Sheng

^*Corresponding author for this work

School of Integrated Circuits and Electronics

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.

Original language	English
Pages (from-to)	8702-8718
Number of pages	17
Journal	Journal of Supercomputing
Volume	80
Issue number	7
DOIs	https://doi.org/10.1007/s11227-023-05759-2
Publication status	Published - May 2024

Keywords

Distributed memory parallelization
Electromagnetic scattering
Many-core acceleration
Multilevel fast multipole algorithm
SW26010 processor

Access to Document

10.1007/s11227-023-05759-2

Cite this

@article{2847206365ce4c02bb5d13570631e79d,

title = "Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster",

abstract = "This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.",

keywords = "Distributed memory parallelization, Electromagnetic scattering, Many-core acceleration, Multilevel fast multipole algorithm, SW26010 processor",

author = "Liu, {Xin Duo} and He, {Wei Jia} and Yang, {Ming Lin} and Sheng, {Xin Qing}",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.",

year = "2024",

month = may,

doi = "10.1007/s11227-023-05759-2",

language = "English",

volume = "80",

pages = "8702--8718",

journal = "Journal of Supercomputing",

issn = "0920-8542",

publisher = "Springer Netherlands",

number = "7",

}

TY - JOUR

T1 - Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

AU - Liu, Xin Duo

AU - He, Wei Jia

AU - Yang, Ming Lin

AU - Sheng, Xin Qing

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.

PY - 2024/5

Y1 - 2024/5

N2 - This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.

AB - This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.

KW - Distributed memory parallelization

KW - Electromagnetic scattering

KW - Many-core acceleration

KW - Multilevel fast multipole algorithm

KW - SW26010 processor

UR - http://www.scopus.com/inward/record.url?scp=85177801302&partnerID=8YFLogxK

U2 - 10.1007/s11227-023-05759-2

DO - 10.1007/s11227-023-05759-2

M3 - Article

AN - SCOPUS:85177801302

SN - 0920-8542

VL - 80

SP - 8702

EP - 8718

JO - Journal of Supercomputing

JF - Journal of Supercomputing

IS - 7

ER -

Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this