Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

Akrem Benatia; Weixing Ji; Yizhuo Wang; Feng Shi

doi:10.1177/1094342019886628

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

Akrem Benatia, Weixing Ji^*, Yizhuo Wang, Feng Shi

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

11 引用（Scopus）

摘要

Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available processing units. The partitioning problem is more challenging with the existence of many sparse formats whose performances depend both on the sparsity of the input matrix and the used hardware. Thus, the best performance does not only depend on how to partition the input sparse matrix but also on which sparse format to use for each partition. To address this challenge, we propose in this article a new CPU-GPU heterogeneous method for computing the SpMV kernel that combines between different sparse formats to achieve better performance and better utilization of CPU-GPU heterogeneous platforms. The proposed solution horizontally partitions the input matrix into multiple block-rows and predicts their best sparse formats using machine learning-based performance models. A mapping algorithm is then used to assign the block-rows to the CPU and GPU(s) available in the system. Our experimental results using real-world large unstructured sparse matrices on two different machines show a noticeable performance improvement.

源语言	英语
页（从-至）	66-80
页数	15
期刊	International Journal of High Performance Computing Applications
卷	34
期	1
DOI	https://doi.org/10.1177/1094342019886628
出版状态	已出版 - 1 1月 2020

访问文件

10.1177/1094342019886628

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{cf99591c995d425b815a2cbd6482f230,

title = "Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms",

abstract = "Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available processing units. The partitioning problem is more challenging with the existence of many sparse formats whose performances depend both on the sparsity of the input matrix and the used hardware. Thus, the best performance does not only depend on how to partition the input sparse matrix but also on which sparse format to use for each partition. To address this challenge, we propose in this article a new CPU-GPU heterogeneous method for computing the SpMV kernel that combines between different sparse formats to achieve better performance and better utilization of CPU-GPU heterogeneous platforms. The proposed solution horizontally partitions the input matrix into multiple block-rows and predicts their best sparse formats using machine learning-based performance models. A mapping algorithm is then used to assign the block-rows to the CPU and GPU(s) available in the system. Our experimental results using real-world large unstructured sparse matrices on two different machines show a noticeable performance improvement.",

keywords = "CPU-GPU heterogeneous platforms, Sparse matrix–vector multiplication (SpMV), sparse matrix partitioning",

author = "Akrem Benatia and Weixing Ji and Yizhuo Wang and Feng Shi",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2019.",

year = "2020",

month = jan,

day = "1",

doi = "10.1177/1094342019886628",

language = "English",

volume = "34",

pages = "66--80",

journal = "International Journal of High Performance Computing Applications",

issn = "1094-3420",

publisher = "SAGE Publications Inc.",

number = "1",

}

TY - JOUR

T1 - Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

AU - Benatia, Akrem

AU - Ji, Weixing

AU - Wang, Yizhuo

AU - Shi, Feng

N1 - Publisher Copyright: © The Author(s) 2019.

PY - 2020/1/1

Y1 - 2020/1/1

N2 - Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available processing units. The partitioning problem is more challenging with the existence of many sparse formats whose performances depend both on the sparsity of the input matrix and the used hardware. Thus, the best performance does not only depend on how to partition the input sparse matrix but also on which sparse format to use for each partition. To address this challenge, we propose in this article a new CPU-GPU heterogeneous method for computing the SpMV kernel that combines between different sparse formats to achieve better performance and better utilization of CPU-GPU heterogeneous platforms. The proposed solution horizontally partitions the input matrix into multiple block-rows and predicts their best sparse formats using machine learning-based performance models. A mapping algorithm is then used to assign the block-rows to the CPU and GPU(s) available in the system. Our experimental results using real-world large unstructured sparse matrices on two different machines show a noticeable performance improvement.

AB - Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available processing units. The partitioning problem is more challenging with the existence of many sparse formats whose performances depend both on the sparsity of the input matrix and the used hardware. Thus, the best performance does not only depend on how to partition the input sparse matrix but also on which sparse format to use for each partition. To address this challenge, we propose in this article a new CPU-GPU heterogeneous method for computing the SpMV kernel that combines between different sparse formats to achieve better performance and better utilization of CPU-GPU heterogeneous platforms. The proposed solution horizontally partitions the input matrix into multiple block-rows and predicts their best sparse formats using machine learning-based performance models. A mapping algorithm is then used to assign the block-rows to the CPU and GPU(s) available in the system. Our experimental results using real-world large unstructured sparse matrices on two different machines show a noticeable performance improvement.

KW - CPU-GPU heterogeneous platforms

KW - Sparse matrix–vector multiplication (SpMV)

KW - sparse matrix partitioning

UR - http://www.scopus.com/inward/record.url?scp=85075349007&partnerID=8YFLogxK

U2 - 10.1177/1094342019886628

DO - 10.1177/1094342019886628

M3 - Article

AN - SCOPUS:85075349007

SN - 1094-3420

VL - 34

SP - 66

EP - 80

JO - International Journal of High Performance Computing Applications

JF - International Journal of High Performance Computing Applications

IS - 1

ER -

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

摘要

访问文件

其它文件与链接

指纹

引用此