Optimizing General Sparse Matrix-Matrix Multiplication on the GPU

  • Yizhuo Wang*
  • , Hongpeng Lin
  • , Bingxin Wei
  • , Jianhua Gao
  • , Weixing Ji
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

General Sparse Matrix-Matrix Multiplication (SpGEMM) is a crucial computational kernel in the field of scientific and engineering computing. Due to the irregular distribution of nonzero elements in sparse matrices, SpGEMM computation faces challenges such as non-contiguous memory access and workload imbalance. This article focuses on optimizing SpGEMM for GPU platforms. First, a lightweight machine learning model is trained to predict the optimal method for estimating the size of result matrix. Next, different kernels are launched in groups to maximize GPU shared memory utilization and achieve load balancing. For the hash-based sparse accumulator, heuristic methods are used to select the optimal hash load factors and hash multiplier factors, thereby reducing the number of hash collisions. In addition, thread reduction is applied in the symbolic phase to enhance intra-block parallelism. Combining these optimization strategies, we implemented an adaptive SpGEMM algorithm for GPUs and compared its performance with current state-of-the-art algorithms. The results show that our algorithm achieves significant performance improvements.

Original languageEnglish
Article number152
JournalTransactions on Architecture and Code Optimization
Volume22
Issue number4
DOIs
Publication statusPublished - 15 Dec 2025
Externally publishedYes

Keywords

  • GPU
  • SpGEMM
  • Sparse matrix multiplication
  • parallel computing

Fingerprint

Dive into the research topics of 'Optimizing General Sparse Matrix-Matrix Multiplication on the GPU'. Together they form a unique fingerprint.

Cite this