ConCeal: A Winograd convolution code template for optimising GCU in parallel

Tian Chen, Yu an Tan, Thar Baker, Haokai Wu, Qiuyu Zhang, Yuanzhang Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

By minimising arithmetic operations, Winograd convolution substantially reduces the computational complexity of convolution, a pivotal operation in the training and inference stages of Convolutional Neural Networks (CNNs). This study leverages the hardware architecture and capabilities of Shanghai Enflame Technology's AI accelerator, the General Computing Unit (GCU). We develop a code template named ConCeal for Winograd convolution with 3 × 3 kernels, employing a set of interrelated optimisations, including task partitioning, memory layout design, and parallelism. These optimisations fully exploit GCU's computing resources by optimising dataflow and parallelizing the execution of tasks on GCU cores, thereby enhancing Winograd convolution. Moreover, the integrated optimisations in the template are efficiently applicable to other operators, such as max pooling. Using this template, we implement and assess the performance of four Winograd convolution operators on GCU. The experimental results showcase that Conceal operators achieve a maximum of 2.04× and an average of 1.49× speedup compared to the fastest GEMM-based convolution implementations on GCU. Additionally, the ConCeal operators demonstrate competitive or superior computing resource utilisation in certain ResNet and VGG convolution layers when compared to cuDNN on RTX2080.

Original languageEnglish
Article number105108
JournalJournal of Parallel and Distributed Computing
Volume203
DOIs
Publication statusPublished - Sept 2025

Keywords

  • Parallel access
  • Parallel channel
  • Parallel computing
  • Parallel Winograd convolution

Fingerprint

Dive into the research topics of 'ConCeal: A Winograd convolution code template for optimising GCU in parallel'. Together they form a unique fingerprint.

Cite this