Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators

Tian Chen; Yu an Tan; Zheng Zhang; Nan Luo; Bin Li; Yuanzhang Li

doi:10.1016/j.jpdc.2024.104869

Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators

Tian Chen, Yu an Tan, Zheng Zhang, Nan Luo, Bin Li, Yuanzhang Li^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works on CNN acceleration usually use fixed design variables for diverse convolutional layers, which would lead to inefficient data movements and low utilization of computing resource. We tackle this issue by proposing a flexible dataflow optimization method with design variables estimation for different layers. The optimization method first narrows the design space by the priori constraints, and then enumerates all legal solutions to select the optimal design variables. We demonstrate the effectiveness of the proposed optimization method by implementing representative CNN models (VGG-16, ResNet-18 and MobileNet V1) on Enflame Technology's programmable CNN accelerator, General Computing Unit (GCU). The results indicate that our optimization can significantly enhance the throughput of the convolution layers in ResNet, VGG and MobileNet on GCU, with improvement of up to 1.84×. Furthermore, it achieves up to 2.08× of GCU utilization specifically for the convolution layers of ResNet on GCU.

源语言	英语
文章编号	104869
期刊	Journal of Parallel and Distributed Computing
卷	189
DOI	https://doi.org/10.1016/j.jpdc.2024.104869
出版状态	已出版 - 7月 2024

访问文件

10.1016/j.jpdc.2024.104869

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e2fcaacdc7d8402a9744f658751c25cc,

title = "Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators",

abstract = "As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works on CNN acceleration usually use fixed design variables for diverse convolutional layers, which would lead to inefficient data movements and low utilization of computing resource. We tackle this issue by proposing a flexible dataflow optimization method with design variables estimation for different layers. The optimization method first narrows the design space by the priori constraints, and then enumerates all legal solutions to select the optimal design variables. We demonstrate the effectiveness of the proposed optimization method by implementing representative CNN models (VGG-16, ResNet-18 and MobileNet V1) on Enflame Technology's programmable CNN accelerator, General Computing Unit (GCU). The results indicate that our optimization can significantly enhance the throughput of the convolution layers in ResNet, VGG and MobileNet on GCU, with improvement of up to 1.84×. Furthermore, it achieves up to 2.08× of GCU utilization specifically for the convolution layers of ResNet on GCU.",

keywords = "Convolutional neural networks (CNNs), General computing unit (GCU), Optimization, Programmable dataflow",

author = "Tian Chen and Tan, {Yu an} and Zheng Zhang and Nan Luo and Bin Li and Yuanzhang Li",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier Inc.",

year = "2024",

month = jul,

doi = "10.1016/j.jpdc.2024.104869",

language = "English",

volume = "189",

journal = "Journal of Parallel and Distributed Computing",

issn = "0743-7315",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators

AU - Chen, Tian

AU - Tan, Yu an

AU - Zhang, Zheng

AU - Luo, Nan

AU - Li, Bin

AU - Li, Yuanzhang

PY - 2024/7

Y1 - 2024/7

N2 - As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works on CNN acceleration usually use fixed design variables for diverse convolutional layers, which would lead to inefficient data movements and low utilization of computing resource. We tackle this issue by proposing a flexible dataflow optimization method with design variables estimation for different layers. The optimization method first narrows the design space by the priori constraints, and then enumerates all legal solutions to select the optimal design variables. We demonstrate the effectiveness of the proposed optimization method by implementing representative CNN models (VGG-16, ResNet-18 and MobileNet V1) on Enflame Technology's programmable CNN accelerator, General Computing Unit (GCU). The results indicate that our optimization can significantly enhance the throughput of the convolution layers in ResNet, VGG and MobileNet on GCU, with improvement of up to 1.84×. Furthermore, it achieves up to 2.08× of GCU utilization specifically for the convolution layers of ResNet on GCU.

AB - As convolution layers have been proved to be the most time-consuming operation in convolutional neural network (CNN) algorithms, many efficient CNN accelerators have been designed to boost the performance of convolution operations. Previous works on CNN acceleration usually use fixed design variables for diverse convolutional layers, which would lead to inefficient data movements and low utilization of computing resource. We tackle this issue by proposing a flexible dataflow optimization method with design variables estimation for different layers. The optimization method first narrows the design space by the priori constraints, and then enumerates all legal solutions to select the optimal design variables. We demonstrate the effectiveness of the proposed optimization method by implementing representative CNN models (VGG-16, ResNet-18 and MobileNet V1) on Enflame Technology's programmable CNN accelerator, General Computing Unit (GCU). The results indicate that our optimization can significantly enhance the throughput of the convolution layers in ResNet, VGG and MobileNet on GCU, with improvement of up to 1.84×. Furthermore, it achieves up to 2.08× of GCU utilization specifically for the convolution layers of ResNet on GCU.

KW - Convolutional neural networks (CNNs)

KW - General computing unit (GCU)

KW - Optimization

KW - Programmable dataflow

UR - http://www.scopus.com/inward/record.url?scp=85187196876&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2024.104869

DO - 10.1016/j.jpdc.2024.104869

M3 - Article

AN - SCOPUS:85187196876

SN - 0743-7315

VL - 189

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

M1 - 104869

ER -

Dataflow optimization with layer-wise design variables estimation method for enflame CNN accelerators

摘要

访问文件

其它文件与链接

指纹

引用此