ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs

Yi Wan; Xianzhong Xie; Junfan Chen; Kunpeng Xie; Dezhi Yi; Ye Lu; Keke Gai

doi:10.1016/j.future.2024.04.038

ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs

Yi Wan, Xianzhong Xie, Junfan Chen, Kunpeng Xie, Dezhi Yi, Ye Lu^*, Keke Gai^*

^*Corresponding author for this work

School of Cyberspace Science and Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access to off-chip synchronous dynamic random-access memory (SDRAM), which significantly degrades inference speed and system power efficiency. In this paper, we propose an adaptive dataflow scheduling method for lightweight CNN accelerator on FPGAs named ADS-CNN. The key idea of ADS-CNN is to efficiently utilize on-chip resources and reduce the amount of SDRAM access. To achieve the reuse of logical resources, we design a time division multiplexing calculation engine to be integrated in ADS-CNN. We implement a configurable module for the convolution controller to adapt to the data reuse of different convolution layers, thus reducing the off-chip access. Furthermore, we exploit on-chip memory blocks as buffers based on the configuration of different layers in lightweight CNNs. On the resource-constrained Intel CycloneV SoC 5CSEBA6 FPGA platform, we evaluated six common lightweight CNN models to demonstrate the performance advantages of ADS-CNN. The evaluation results indicate that, compared with accelerators that use traditional tiling strategy dataflow, our ADS-CNN can achieve up to 1.29× speedup with the overall dataflow scale compression of 23.7%.

Original language	English
Pages (from-to)	138-149
Number of pages	12
Journal	Future Generation Computer Systems
Volume	158
DOIs	https://doi.org/10.1016/j.future.2024.04.038
Publication status	Published - Sept 2024

Keywords

Accelerator
Adaptive dataflow
FPGA
Lightweight convolutional neural networks
Tiling strategy
Unified computing engine

Access to Document

10.1016/j.future.2024.04.038

Cite this

Wan, Y., Xie, X., Chen, J., Xie, K., Yi, D., Lu, Y., & Gai, K. (2024). ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs. Future Generation Computer Systems, 158, 138-149. https://doi.org/10.1016/j.future.2024.04.038

@article{5edaa70ce8854b41958698a81244b617,

title = "ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs",

abstract = "Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access to off-chip synchronous dynamic random-access memory (SDRAM), which significantly degrades inference speed and system power efficiency. In this paper, we propose an adaptive dataflow scheduling method for lightweight CNN accelerator on FPGAs named ADS-CNN. The key idea of ADS-CNN is to efficiently utilize on-chip resources and reduce the amount of SDRAM access. To achieve the reuse of logical resources, we design a time division multiplexing calculation engine to be integrated in ADS-CNN. We implement a configurable module for the convolution controller to adapt to the data reuse of different convolution layers, thus reducing the off-chip access. Furthermore, we exploit on-chip memory blocks as buffers based on the configuration of different layers in lightweight CNNs. On the resource-constrained Intel CycloneV SoC 5CSEBA6 FPGA platform, we evaluated six common lightweight CNN models to demonstrate the performance advantages of ADS-CNN. The evaluation results indicate that, compared with accelerators that use traditional tiling strategy dataflow, our ADS-CNN can achieve up to 1.29× speedup with the overall dataflow scale compression of 23.7%.",

keywords = "Accelerator, Adaptive dataflow, FPGA, Lightweight convolutional neural networks, Tiling strategy, Unified computing engine",

author = "Yi Wan and Xianzhong Xie and Junfan Chen and Kunpeng Xie and Dezhi Yi and Ye Lu and Keke Gai",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = sep,

doi = "10.1016/j.future.2024.04.038",

language = "English",

volume = "158",

pages = "138--149",

journal = "Future Generation Computer Systems",

issn = "0167-739X",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - ADS-CNN

T2 - Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs

AU - Wan, Yi

AU - Xie, Xianzhong

AU - Chen, Junfan

AU - Xie, Kunpeng

AU - Yi, Dezhi

AU - Lu, Ye

AU - Gai, Keke

PY - 2024/9

Y1 - 2024/9

N2 - Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access to off-chip synchronous dynamic random-access memory (SDRAM), which significantly degrades inference speed and system power efficiency. In this paper, we propose an adaptive dataflow scheduling method for lightweight CNN accelerator on FPGAs named ADS-CNN. The key idea of ADS-CNN is to efficiently utilize on-chip resources and reduce the amount of SDRAM access. To achieve the reuse of logical resources, we design a time division multiplexing calculation engine to be integrated in ADS-CNN. We implement a configurable module for the convolution controller to adapt to the data reuse of different convolution layers, thus reducing the off-chip access. Furthermore, we exploit on-chip memory blocks as buffers based on the configuration of different layers in lightweight CNNs. On the resource-constrained Intel CycloneV SoC 5CSEBA6 FPGA platform, we evaluated six common lightweight CNN models to demonstrate the performance advantages of ADS-CNN. The evaluation results indicate that, compared with accelerators that use traditional tiling strategy dataflow, our ADS-CNN can achieve up to 1.29× speedup with the overall dataflow scale compression of 23.7%.

AB - Lightweight convolutional neural networks (CNNs) enable lower inference latency and data traffic, facilitating deployment on resource-constrained edge devices such as field-programmable gate arrays (FPGAs). However, CNNs inference requires access to off-chip synchronous dynamic random-access memory (SDRAM), which significantly degrades inference speed and system power efficiency. In this paper, we propose an adaptive dataflow scheduling method for lightweight CNN accelerator on FPGAs named ADS-CNN. The key idea of ADS-CNN is to efficiently utilize on-chip resources and reduce the amount of SDRAM access. To achieve the reuse of logical resources, we design a time division multiplexing calculation engine to be integrated in ADS-CNN. We implement a configurable module for the convolution controller to adapt to the data reuse of different convolution layers, thus reducing the off-chip access. Furthermore, we exploit on-chip memory blocks as buffers based on the configuration of different layers in lightweight CNNs. On the resource-constrained Intel CycloneV SoC 5CSEBA6 FPGA platform, we evaluated six common lightweight CNN models to demonstrate the performance advantages of ADS-CNN. The evaluation results indicate that, compared with accelerators that use traditional tiling strategy dataflow, our ADS-CNN can achieve up to 1.29× speedup with the overall dataflow scale compression of 23.7%.

KW - Accelerator

KW - Adaptive dataflow

KW - FPGA

KW - Lightweight convolutional neural networks

KW - Tiling strategy

KW - Unified computing engine

UR - http://www.scopus.com/inward/record.url?scp=85191353934&partnerID=8YFLogxK

U2 - 10.1016/j.future.2024.04.038

DO - 10.1016/j.future.2024.04.038

M3 - Article

AN - SCOPUS:85191353934

SN - 0167-739X

VL - 158

SP - 138

EP - 149

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

ER -

ADS-CNN: Adaptive Dataflow Scheduling for lightweight CNN accelerator on FPGAs

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this