CORB-Planner: Corridor as Observations for RL Planning in High-Speed Flight

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Reinforcement learning (RL) has shown promise in a large number of robotic control tasks. However, its deployment in unmanned aerial vehicles (UAVs) remains challenging, mainly because of the reliance on accurate dynamic models and platform-specific sensing, which hinders cross-platform transfer. This article presents the corridor-as-observations for RL B-spline (CORB)-planner, a real-time, RL-based trajectory planning framework for high-speed autonomous UAV flight across heterogeneous platforms. The key idea is to combine B-spline trajectory generation—with the RL policy producing successive control points—with a compact safe flight corridor (SFC) representation obtained via heuristic search. The SFC abstracts obstacle information in a low-dimensional form, mitigating overfitting to platform-specific details and reducing sensitivity to model inaccuracies. To narrow the sim-to-real gap, we adopt an easy-to-hard progressive training pipeline in simulation. A value-based soft decomposed-critic Q algorithm is used to learn effective policies within approximately 10 min of training. Benchmarks in simulation and real-world tests demonstrate real-time planning on lightweight onboard hardware and support maximum flight speeds of up to 8.2 m/s in dense and cluttered environments without external positioning. Compatibility with various UAV configurations (quadrotors and hexarotors) and modest onboard compute underlines the generality and robustness of CORB-planner for practical deployment.

Original languageEnglish
Pages (from-to)4070-4080
Number of pages11
JournalIEEE/ASME Transactions on Mechatronics
Volume30
Issue number6
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Aerial systems
  • agile flight
  • reinforcement learning
  • trajectory planning

Fingerprint

Dive into the research topics of 'CORB-Planner: Corridor as Observations for RL Planning in High-Speed Flight'. Together they form a unique fingerprint.

Cite this