Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms

Senhao Shao, Yizhuo Wang*, Weixing Ji, Jianhua Gao

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Increasing computing power has become available through the use of GPUs, bringing new opportunities to accelerate fast matrix multiplication using GPUs. Although researchers have proposed several optimization schemes for the Strassen algorithm on the GPU, they have not fully utilized the computing resources of CPU. In this paper, we propose a CPU-GPU heterogeneous implementation for the Winograd algorithm based on task graph scheduling. It uses work-stealing scheduler to achieve balanced load. We also propose two recursive task graph extension strategies: homogeneous and heterogeneous extension. We invoke different execution strategies in different recursive levels and design a predictor based on the random forest regression model to make a decision. Finally, the experimental evaluations are performed on a CPU-GPU heterogeneous platform. It shows that the improved Winograd algorithm achieves an average speedup of 1.6x, 1.5x and 1.4x against to cuBLAS, Winograd on CPU, and Winograd on GPU for matrices with matrix dimension greater than 5000, respectively.

源语言英语
主期刊名Parallel and Distributed Computing, Applications and Technologies - 22nd International Conference, PDCAT 2021, Proceedings
编辑Hong Shen, Yingpeng Sang, Yong Zhang, Nong Xiao, Hamid R. Arabnia, Geoffrey Fox, Ajay Gupta, Manu Malek
出版商Springer Science and Business Media Deutschland GmbH
223-236
页数14
ISBN(印刷版)9783030967710
DOI
出版状态已出版 - 2022
活动22nd International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2021 - Guangzhou, 中国
期限: 17 12月 202119 12月 2021

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13148 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议22nd International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2021
国家/地区中国
Guangzhou
时期17/12/2119/12/21

指纹

探究 'Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms' 的科研主题。它们共同构成独一无二的指纹。

引用此