Skip to main navigation Skip to search Skip to main content

Software-Hardware Co-Optimized High-Throughput CNN Acceleration Architecture for on-board Remote Sensing Based on Distribution Difference-Aware Mixed-Precision Quantization

  • Haitao Chen
  • , He Chen
  • , Ning Zhang*
  • , Tong Wang
  • , Shuo Ni
  • , Liang Chen
  • *Corresponding author for this work
  • Beijing Institute of Technology
  • Hong Kong Polytechnic University

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional Neural Networks (CNNs) have demonstrated outstanding performance in various remote sensing intelligent tasks in recent years. However, in scenarios such as on-board emergency response, second-level on-board processing is required to achieve both low latency and high accuracy, while the limited resources and power constraints of on-board platforms make it difficult to process the large-scale parameters and intensive computation of CNNs efficiently on-chip. To address the aforementioned issues, this paper proposes a real-time on-board CNN acceleration architecture based on Distribution Difference-aware Mixed-precision Quantization (DDAMQ), achieving efficient deployment through collaborative hardware and software optimization. At the algorithm level, low-bit quantization is employed to reduce computational and resource overhead. Furthermore, feature distribution difference modeling and outlier suppression are used to adaptively optimize the quantization bit width and pruning threshold, significantly mitigating the accuracy loss caused by low-bit quantization and improving model performance. At the hardware level, the computation and storage structure is optimized to improve resource utilization and reduce overall power consumption, making this architecture more suitable for resource-constrained real-time intelligent processing tasks. A resource consumption evaluation model is constructed to provide guidance for efficient deployment that meets the on-chip resource constraints of FPGAs. Experiments based on four mainstream CNN models on five typical datasets are conducted and validated on an AMD-Xilinx VC709 development board. The results show that, at a similar compression ratio, the proposed DDAMQ strategy achieves superior classification performance; compared with recent state-of-the-art work, the designed CNN acceleration architecture improves throughput and energy efficiency by at least 2.17 and 2.13 times.

Keywords

  • Bit width allocation
  • Convolutional neural network
  • FPGA-based accelerator
  • Low bit-width quantization
  • Software-hardware Co-optimization

Fingerprint

Dive into the research topics of 'Software-Hardware Co-Optimized High-Throughput CNN Acceleration Architecture for on-board Remote Sensing Based on Distribution Difference-Aware Mixed-Precision Quantization'. Together they form a unique fingerprint.

Cite this