TY - JOUR
T1 - ADS-CNN
T2 - Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs
AU - Wan, Yi
AU - Xie, Xianzhong
AU - Chen, Junfan
AU - Xie, Kunpeng
AU - Yi, Dezhi
AU - Lu, Ye
AU - Gai, Keke
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/9
Y1 - 2024/9
N2 - Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access to off-chip synchronous dynamic random-access memory (SDRAM), which significantly degrades inference speed and system power efficiency. In this paper, we propose an adaptive dataflow scheduling method for lightweight CNN accelerator on FPGAs named ADS-CNN. The key idea of ADS-CNN is to efficiently utilize on-chip resources and reduce the amount of SDRAM access. To achieve the reuse of logical resources, we design a time division multiplexing calculation engine to be integrated in ADS-CNN. We implement a configurable module for the convolution controller to adapt to the data reuse of different convolution layers, thus reducing the off-chip access. Furthermore, we exploit on-chip memory blocks as buffers based on the configuration of different layers in lightweight CNNs. On the resource-constrained Intel CycloneV SoC 5CSEBA6 FPGA platform, we evaluated six common lightweight CNN models to demonstrate the performance advantages of ADS-CNN. The evaluation results indicate that, compared with accelerators that use traditional tiling strategy dataflow, our ADS-CNN can achieve up to 1.29× speedup with the overall dataflow scale compression of 23.7%.
AB - Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access to off-chip synchronous dynamic random-access memory (SDRAM), which significantly degrades inference speed and system power efficiency. In this paper, we propose an adaptive dataflow scheduling method for lightweight CNN accelerator on FPGAs named ADS-CNN. The key idea of ADS-CNN is to efficiently utilize on-chip resources and reduce the amount of SDRAM access. To achieve the reuse of logical resources, we design a time division multiplexing calculation engine to be integrated in ADS-CNN. We implement a configurable module for the convolution controller to adapt to the data reuse of different convolution layers, thus reducing the off-chip access. Furthermore, we exploit on-chip memory blocks as buffers based on the configuration of different layers in lightweight CNNs. On the resource-constrained Intel CycloneV SoC 5CSEBA6 FPGA platform, we evaluated six common lightweight CNN models to demonstrate the performance advantages of ADS-CNN. The evaluation results indicate that, compared with accelerators that use traditional tiling strategy dataflow, our ADS-CNN can achieve up to 1.29× speedup with the overall dataflow scale compression of 23.7%.
KW - Accelerator
KW - Adaptive dataflow
KW - FPGA
KW - Lightweight convolutional neural networks
KW - Tiling strategy
KW - Unified computing engine
UR - http://www.scopus.com/inward/record.url?scp=85191353934&partnerID=8YFLogxK
U2 - 10.1016/j.future.2024.04.038
DO - 10.1016/j.future.2024.04.038
M3 - Article
AN - SCOPUS:85191353934
SN - 0167-739X
VL - 158
SP - 138
EP - 149
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -