跳到主要导航 跳到搜索 跳到主要内容

A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization

  • Yifan Zhang
  • , Xiaoyu Niu
  • , Hongzheng Tian
  • , Yanjun Zhang
  • , Bo Yu*
  • , Shaoshan Liu
  • , Sitao Huang*
  • *此作品的通讯作者
  • University of California at Irvine
  • Beijing Institute of Technology
  • Shenzhen Institute of Artificial Intelligence and Robotics for Society

科研成果: 期刊稿件文章同行评审

摘要

Path planning is a critical task for autonomous driving, aiming to generate smooth, collision-free, and feasible paths based on input perception and localization information. The planning task is both highly time-sensitive and computationally intensive, posing significant challenges to resource-constrained autonomous driving hardware. In this article, we propose an end-to-end framework for accelerating path planning on FPGA platforms. This framework focuses on accelerating quadratic programming (QP) solving, which is the core of optimization-based path planning and has the most computationally-intensive workloads. Our method leverages a hardware-friendly alternating direction method of multipliers (ADMM) to solve QP problems while employing a highly parallelizable preconditioned conjugate gradient (PCG) method for solving the associated linear systems. We analyze the sparse patterns of matrix operations in QP and design customized storage schemes along with efficient sparse matrix multiplication and sparse matrix-vector multiplication units. Our customized design significantly reduces resource consumption for data storage and computation while dramatically speeding up matrix operations. Additionally, we propose a multi-level dataflow optimization strategy. Within individual operators, we achieve acceleration through parallelization and pipelining. For different operators in an algorithm, we analyze inter-operator data dependencies to enable fine-grained pipelining. At the system level, we map different steps of the planning process to the CPU and FPGA and pipeline these steps to enhance end-to-end throughput. We implement and validate our design on the AMD ZCU102 platform. Our implementation achieves state-of-the-art performance in both latency and energy efficiency compared with existing works, including an average 1.48× speedup over the best FPGA-based design, a 2.89× speedup compared with the state-of-the-art QP solver on an Intel i7-11800H CPU, a 5.62× speedup over an ARM Cortex-A57 embedded CPU, and a 1.56× speedup over state-of-the-art GPU-based work. Furthermore, our design delivers a 2.05× improvement in throughput compared with the state-of-the-art FPGA-based design.

源语言英语
文章编号111
期刊Transactions on Architecture and Code Optimization
22
3
DOI
出版状态已出版 - 19 9月 2025

指纹

探究 'A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization' 的科研主题。它们共同构成独一无二的指纹。

引用此