TY - GEN
T1 - Learning Aerial Docking via Offline-to-Online Reinforcement Learning
AU - Tao, Yang
AU - Yuting, Feng
AU - Yu, Yushu
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In this study, we explore the intricate task of executing a docking operation between two quadrotors, utilizing a blend of offline and online reinforcement learning techniques. This multifaceted task is fraught with the catastrophic forgetting problem due to its multiple objectives: it necessitates precise control over the trajectory of the quadrotors, ensures their stability during the docking process, and maintains stable control post-docking. The complexity of these objectives presents significant challenges when applying reinforcement learning directly, often leading to unsuccessful outcomes. To navigate these challenges, we initially developed a rule-based expert controller and amassed a substantial dataset. Subsequently, we employed offline reinforcement learning to train a guided policy, which was then fine-tuned using online reinforcement learning. This approach effectively addressed the out-of-distribution issues typically encountered in online reinforcement learning with guided policies. Notably, our methodology significantly enhanced the success rate of the expert strategy, boosting it from 40% to an impressive 95%1,.
AB - In this study, we explore the intricate task of executing a docking operation between two quadrotors, utilizing a blend of offline and online reinforcement learning techniques. This multifaceted task is fraught with the catastrophic forgetting problem due to its multiple objectives: it necessitates precise control over the trajectory of the quadrotors, ensures their stability during the docking process, and maintains stable control post-docking. The complexity of these objectives presents significant challenges when applying reinforcement learning directly, often leading to unsuccessful outcomes. To navigate these challenges, we initially developed a rule-based expert controller and amassed a substantial dataset. Subsequently, we employed offline reinforcement learning to train a guided policy, which was then fine-tuned using online reinforcement learning. This approach effectively addressed the out-of-distribution issues typically encountered in online reinforcement learning with guided policies. Notably, our methodology significantly enhanced the success rate of the expert strategy, boosting it from 40% to an impressive 95%1,.
KW - control
KW - quadrotor
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85199639383&partnerID=8YFLogxK
U2 - 10.1109/ICCCR61138.2024.10585395
DO - 10.1109/ICCCR61138.2024.10585395
M3 - Conference contribution
AN - SCOPUS:85199639383
T3 - 2024 4th International Conference on Computer, Control and Robotics, ICCCR 2024
SP - 305
EP - 309
BT - 2024 4th International Conference on Computer, Control and Robotics, ICCCR 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th International Conference on Computer, Control and Robotics, ICCCR 2024
Y2 - 19 April 2024 through 21 April 2024
ER -