Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms

Senhao Shao, Yizhuo Wang*, Weixing Ji, Jianhua Gao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Increasing computing power has become available through the use of GPUs, bringing new opportunities to accelerate fast matrix multiplication using GPUs. Although researchers have proposed several optimization schemes for the Strassen algorithm on the GPU, they have not fully utilized the computing resources of CPU. In this paper, we propose a CPU-GPU heterogeneous implementation for the Winograd algorithm based on task graph scheduling. It uses work-stealing scheduler to achieve balanced load. We also propose two recursive task graph extension strategies: homogeneous and heterogeneous extension. We invoke different execution strategies in different recursive levels and design a predictor based on the random forest regression model to make a decision. Finally, the experimental evaluations are performed on a CPU-GPU heterogeneous platform. It shows that the improved Winograd algorithm achieves an average speedup of 1.6x, 1.5x and 1.4x against to cuBLAS, Winograd on CPU, and Winograd on GPU for matrices with matrix dimension greater than 5000, respectively.

Original languageEnglish
Title of host publicationParallel and Distributed Computing, Applications and Technologies - 22nd International Conference, PDCAT 2021, Proceedings
EditorsHong Shen, Yingpeng Sang, Yong Zhang, Nong Xiao, Hamid R. Arabnia, Geoffrey Fox, Ajay Gupta, Manu Malek
PublisherSpringer Science and Business Media Deutschland GmbH
Pages223-236
Number of pages14
ISBN (Print)9783030967710
DOIs
Publication statusPublished - 2022
Event22nd International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2021 - Guangzhou, China
Duration: 17 Dec 202119 Dec 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13148 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2021
Country/TerritoryChina
CityGuangzhou
Period17/12/2119/12/21

Keywords

  • CPU-GPU heterogeneous architecture
  • Matrix multiplication
  • Random forest regression
  • Winograd algorithm

Fingerprint

Dive into the research topics of 'Towards Optimal Fast Matrix Multiplication on CPU-GPU Platforms'. Together they form a unique fingerprint.

Cite this