Task parallel implementation of matrix multiplication on multi-socket multi-core architectures

Yizhuo Wang*, Weixing Ji, Xu Chen, Sensen Hu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Matrix multiplication is a very important computation kernel in many science and engineering applications. This paper presents a parallel implementation framework for dense matrix multiplication on multi-socket multi-core architectures. Our framework first partitions the computation between the multi-core processors. Then a hybrid matrix multiplication algorithm is used on each processor, which combines the Winograd algorithm and the classical algorithm. In addition, a hierarchical work-stealing scheme is applied to achieve dynamic load balancing and enforce data locality in our framework. Performance experiments on two platforms show that our implementation gets significant performance gains compared with the state-of-the-art implementations.

源语言英语
主期刊名Algorithms and Architectures for Parallel Processing - 15th International Conference, ICA3PP 2015, Proceedings
编辑Guojun Wang, Gregorio Martinez Perez, Albert Zomaya, Kenli Li
出版商Springer Verlag
93-104
页数12
ISBN(印刷版)9783319271361
DOI
出版状态已出版 - 2015
活动15th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2015 - Zhangjiajie, 中国
期限: 18 11月 201520 11月 2015

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
9530
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议15th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2015
国家/地区中国
Zhangjiajie
时期18/11/1520/11/15

指纹

探究 'Task parallel implementation of matrix multiplication on multi-socket multi-core architectures' 的科研主题。它们共同构成独一无二的指纹。

引用此