Abstract
General Sparse Matrix-Matrix Multiplication (SpGEMM) is a crucial computational kernel in the field of scientific and engineering computing. Due to the irregular distribution of nonzero elements in sparse matrices, SpGEMM computation faces challenges such as non-contiguous memory access and workload imbalance. This article focuses on optimizing SpGEMM for GPU platforms. First, a lightweight machine learning model is trained to predict the optimal method for estimating the size of result matrix. Next, different kernels are launched in groups to maximize GPU shared memory utilization and achieve load balancing. For the hash-based sparse accumulator, heuristic methods are used to select the optimal hash load factors and hash multiplier factors, thereby reducing the number of hash collisions. In addition, thread reduction is applied in the symbolic phase to enhance intra-block parallelism. Combining these optimization strategies, we implemented an adaptive SpGEMM algorithm for GPUs and compared its performance with current state-of-the-art algorithms. The results show that our algorithm achieves significant performance improvements.
| Original language | English |
|---|---|
| Article number | 152 |
| Journal | Transactions on Architecture and Code Optimization |
| Volume | 22 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 15 Dec 2025 |
| Externally published | Yes |
Keywords
- GPU
- SpGEMM
- Sparse matrix multiplication
- parallel computing