Task parallel implementation of matrix multiplication on multi-socket multi-core architectures

Yizhuo Wang*, Weixing Ji, Xu Chen, Sensen Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Matrix multiplication is a very important computation kernel in many science and engineering applications. This paper presents a parallel implementation framework for dense matrix multiplication on multi-socket multi-core architectures. Our framework first partitions the computation between the multi-core processors. Then a hybrid matrix multiplication algorithm is used on each processor, which combines the Winograd algorithm and the classical algorithm. In addition, a hierarchical work-stealing scheme is applied to achieve dynamic load balancing and enforce data locality in our framework. Performance experiments on two platforms show that our implementation gets significant performance gains compared with the state-of-the-art implementations.

Original languageEnglish
Title of host publicationAlgorithms and Architectures for Parallel Processing - 15th International Conference, ICA3PP 2015, Proceedings
EditorsGuojun Wang, Gregorio Martinez Perez, Albert Zomaya, Kenli Li
PublisherSpringer Verlag
Pages93-104
Number of pages12
ISBN (Print)9783319271361
DOIs
Publication statusPublished - 2015
Event15th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2015 - Zhangjiajie, China
Duration: 18 Nov 201520 Nov 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9530
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2015
Country/TerritoryChina
CityZhangjiajie
Period18/11/1520/11/15

Keywords

  • Fast algorithms
  • Matrix multiplications
  • Multi-socket
  • Winograd
  • Work-stealing

Fingerprint

Dive into the research topics of 'Task parallel implementation of matrix multiplication on multi-socket multi-core architectures'. Together they form a unique fingerprint.

Cite this