TY - JOUR
T1 - A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization
AU - Zhang, Yifan
AU - Niu, Xiaoyu
AU - Tian, Hongzheng
AU - Zhang, Yanjun
AU - Yu, Bo
AU - Liu, Shaoshan
AU - Huang, Sitao
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/9/19
Y1 - 2025/9/19
N2 - Path planning is a critical task for autonomous driving, aiming to generate smooth, collision-free, and feasible paths based on input perception and localization information. The planning task is both highly time-sensitive and computationally intensive, posing significant challenges to resource-constrained autonomous driving hardware. In this article, we propose an end-to-end framework for accelerating path planning on FPGA platforms. This framework focuses on accelerating quadratic programming (QP) solving, which is the core of optimization-based path planning and has the most computationally-intensive workloads. Our method leverages a hardware-friendly alternating direction method of multipliers (ADMM) to solve QP problems while employing a highly parallelizable preconditioned conjugate gradient (PCG) method for solving the associated linear systems. We analyze the sparse patterns of matrix operations in QP and design customized storage schemes along with efficient sparse matrix multiplication and sparse matrix-vector multiplication units. Our customized design significantly reduces resource consumption for data storage and computation while dramatically speeding up matrix operations. Additionally, we propose a multi-level dataflow optimization strategy. Within individual operators, we achieve acceleration through parallelization and pipelining. For different operators in an algorithm, we analyze inter-operator data dependencies to enable fine-grained pipelining. At the system level, we map different steps of the planning process to the CPU and FPGA and pipeline these steps to enhance end-to-end throughput. We implement and validate our design on the AMD ZCU102 platform. Our implementation achieves state-of-the-art performance in both latency and energy efficiency compared with existing works, including an average 1.48× speedup over the best FPGA-based design, a 2.89× speedup compared with the state-of-the-art QP solver on an Intel i7-11800H CPU, a 5.62× speedup over an ARM Cortex-A57 embedded CPU, and a 1.56× speedup over state-of-the-art GPU-based work. Furthermore, our design delivers a 2.05× improvement in throughput compared with the state-of-the-art FPGA-based design.
AB - Path planning is a critical task for autonomous driving, aiming to generate smooth, collision-free, and feasible paths based on input perception and localization information. The planning task is both highly time-sensitive and computationally intensive, posing significant challenges to resource-constrained autonomous driving hardware. In this article, we propose an end-to-end framework for accelerating path planning on FPGA platforms. This framework focuses on accelerating quadratic programming (QP) solving, which is the core of optimization-based path planning and has the most computationally-intensive workloads. Our method leverages a hardware-friendly alternating direction method of multipliers (ADMM) to solve QP problems while employing a highly parallelizable preconditioned conjugate gradient (PCG) method for solving the associated linear systems. We analyze the sparse patterns of matrix operations in QP and design customized storage schemes along with efficient sparse matrix multiplication and sparse matrix-vector multiplication units. Our customized design significantly reduces resource consumption for data storage and computation while dramatically speeding up matrix operations. Additionally, we propose a multi-level dataflow optimization strategy. Within individual operators, we achieve acceleration through parallelization and pipelining. For different operators in an algorithm, we analyze inter-operator data dependencies to enable fine-grained pipelining. At the system level, we map different steps of the planning process to the CPU and FPGA and pipeline these steps to enhance end-to-end throughput. We implement and validate our design on the AMD ZCU102 platform. Our implementation achieves state-of-the-art performance in both latency and energy efficiency compared with existing works, including an average 1.48× speedup over the best FPGA-based design, a 2.89× speedup compared with the state-of-the-art QP solver on an Intel i7-11800H CPU, a 5.62× speedup over an ARM Cortex-A57 embedded CPU, and a 1.56× speedup over state-of-the-art GPU-based work. Furthermore, our design delivers a 2.05× improvement in throughput compared with the state-of-the-art FPGA-based design.
KW - FPGA
KW - Path planning
KW - autonomous driving
KW - quadratic programming
UR - https://www.scopus.com/pages/publications/105018740144
U2 - 10.1145/3750449
DO - 10.1145/3750449
M3 - Article
AN - SCOPUS:105018740144
SN - 1544-3566
VL - 22
JO - Transactions on Architecture and Code Optimization
JF - Transactions on Architecture and Code Optimization
IS - 3
M1 - 111
ER -