Abstract
The CPU-FPGA heterogeneous computing architecture is extensively employed in the embedded domain due to its low cost and power efficiency, with numerous sparse matrix-vector multiplication (SpMV) acceleration efforts already targeting this architecture. However, existing work rarely includes collaborative SpMV computations between CPU and FPGA, which limits the exploration of hybrid architectures that could potentially offer enhanced performance and flexibility. This article introduces an FPGA architecture design that supports multiprecision SpMV computations, including FP16, FP32, and FP64. Building on this, PTPS, a precision-aware SpMV task partitioning and dynamic scheduling algorithm tailored for the CPU-FPGA heterogeneous architecture, is proposed. The core idea of PTPS is lossless partitioning of sparse matrices across multiple precisions, prioritizing low-precision SpMV computations on the FPGA and high-precision computations on the CPU. PTPS not only leverages the strengths of CPU and FPGA for collaborative SpMV computations but also reduces data transmission overhead between them, thereby improving the overall computational efficiency. Experimental evaluation demonstrates that the proposed approach offers an average speedup of 1.57 × over the CPU-only approach and 2.58 × over the FPGA-only approach.
| Original language | English |
|---|---|
| Pages (from-to) | 3804-3815 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
| Volume | 44 |
| Issue number | 10 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
Keywords
- FPGA
- SpMV
- heterogeneous architecture
- mixed precision
- task scheduling
Fingerprint
Dive into the research topics of 'PTPS: Precision-Aware Task Partitioning and Scheduling for SpMV on CPU-FPGA Heterogeneous Platforms'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver