Abstract
Reinforcement learning (RL) has shown promise in a large number of robotic control tasks. However, its deployment in unmanned aerial vehicles (UAVs) remains challenging, mainly because of the reliance on accurate dynamic models and platform-specific sensing, which hinders cross-platform transfer. This article presents the corridor-as-observations for RL B-spline (CORB)-planner, a real-time, RL-based trajectory planning framework for high-speed autonomous UAV flight across heterogeneous platforms. The key idea is to combine B-spline trajectory generation—with the RL policy producing successive control points—with a compact safe flight corridor (SFC) representation obtained via heuristic search. The SFC abstracts obstacle information in a low-dimensional form, mitigating overfitting to platform-specific details and reducing sensitivity to model inaccuracies. To narrow the sim-to-real gap, we adopt an easy-to-hard progressive training pipeline in simulation. A value-based soft decomposed-critic Q algorithm is used to learn effective policies within approximately 10 min of training. Benchmarks in simulation and real-world tests demonstrate real-time planning on lightweight onboard hardware and support maximum flight speeds of up to 8.2 m/s in dense and cluttered environments without external positioning. Compatibility with various UAV configurations (quadrotors and hexarotors) and modest onboard compute underlines the generality and robustness of CORB-planner for practical deployment.
| Original language | English |
|---|---|
| Pages (from-to) | 4070-4080 |
| Number of pages | 11 |
| Journal | IEEE/ASME Transactions on Mechatronics |
| Volume | 30 |
| Issue number | 6 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
Keywords
- Aerial systems
- agile flight
- reinforcement learning
- trajectory planning