TY - GEN
T1 - Load Balancing Optimizations for Distributed GMRES Algorithm
AU - Zhang, Yuxiang
AU - Guo, Shuaizhe
AU - Gao, Jianhua
AU - Ji, Weixing
AU - Wang, Yizhuo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - The Generalized Minimal Residual Method (GMRES) is one of the most important iterative algorithms for solving large-scale sparse linear systems, which are widely used in fields such as computational fluid dynamics and computational electromagnetics. As the scale of problems increases, multi-node distributed systems become one of the most popular running environments. Communication efficiency is usually the primary performance bottleneck for distributed GMRES. Traditional work reduces communication load mainly by balancing the computation load, while the balance of communication load is also important. This paper proposes a rule-based algorithm and a reinforcement learning (RL)-based algorithm to balance the communication load. By optimizing the partitioning of sparse matrices using the rule-based algorithm, the balance of both computation and communication loads among devices is improved. Experimental results show that the speedup can reach up to 1.34x. Moreover, RL-based algorithm improves the efficiency of the iterative algorithm by optimizing the task allocation of the partitioned sub-matrices. Experimental results present that the speedup can reach up to 1.30x.
AB - The Generalized Minimal Residual Method (GMRES) is one of the most important iterative algorithms for solving large-scale sparse linear systems, which are widely used in fields such as computational fluid dynamics and computational electromagnetics. As the scale of problems increases, multi-node distributed systems become one of the most popular running environments. Communication efficiency is usually the primary performance bottleneck for distributed GMRES. Traditional work reduces communication load mainly by balancing the computation load, while the balance of communication load is also important. This paper proposes a rule-based algorithm and a reinforcement learning (RL)-based algorithm to balance the communication load. By optimizing the partitioning of sparse matrices using the rule-based algorithm, the balance of both computation and communication loads among devices is improved. Experimental results show that the speedup can reach up to 1.34x. Moreover, RL-based algorithm improves the efficiency of the iterative algorithm by optimizing the task allocation of the partitioned sub-matrices. Experimental results present that the speedup can reach up to 1.30x.
KW - GMRES
KW - Load balancing
KW - Reinforcement learning
KW - Sparse matrix
UR - http://www.scopus.com/inward/record.url?scp=85218954534&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-1551-3_5
DO - 10.1007/978-981-96-1551-3_5
M3 - Conference contribution
AN - SCOPUS:85218954534
SN - 9789819615506
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 47
EP - 56
BT - Algorithms and Architectures for Parallel Processing - 24th International Conference, ICA3PP 2024, Macau, China, October 29–31, 2024, Proceedings
A2 - Zhu, Tianqing
A2 - Li, Jin
A2 - Castiglione, Aniello
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024
Y2 - 29 October 2024 through 31 October 2024
ER -