TY - GEN
T1 - Human-Guided Reinforcement Learning Using Multi Q-Advantage for End-to-End Autonomous Driving
AU - Wang, Pei
AU - Wang, Yong
AU - He, Hongwen
AU - Wu, Jingda
AU - Kuang, Zirui
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Reinforcement learning (RL) is a promising approach for end-to-end autonomous driving. However, training a RL strategy for autonomous driving is challenging, requiring a meticulously crafted reward function and an extensive number of interactions. This necessitates reconsidering the integration of RL with human guidance, acknowledging the potential benefits of combining machine learning with human expertise. Herein, we propose a novel human-guided reinforcement learning (HGRL) algorithm for policy training in the context of end-to-end autonomous driving. Our HGRL method employs several mechanisms to enhance the effectiveness of low-intensity human guidance, including a regressed human policy model, multiple Q-advantage technique, and prioritized human experience replay. The proposed method is evaluated on a challenging lane-changing and overtaking driving task based only on small neural networks and image inputs. Simulation results show that our method surpasses current state-of-the-art human-guided RL algorithms in both driving performance and generalization performance. Furthermore, the proposed HGRL method trained in simulation is then transferred to a real-world autonomous vehicle.
AB - Reinforcement learning (RL) is a promising approach for end-to-end autonomous driving. However, training a RL strategy for autonomous driving is challenging, requiring a meticulously crafted reward function and an extensive number of interactions. This necessitates reconsidering the integration of RL with human guidance, acknowledging the potential benefits of combining machine learning with human expertise. Herein, we propose a novel human-guided reinforcement learning (HGRL) algorithm for policy training in the context of end-to-end autonomous driving. Our HGRL method employs several mechanisms to enhance the effectiveness of low-intensity human guidance, including a regressed human policy model, multiple Q-advantage technique, and prioritized human experience replay. The proposed method is evaluated on a challenging lane-changing and overtaking driving task based only on small neural networks and image inputs. Simulation results show that our method surpasses current state-of-the-art human-guided RL algorithms in both driving performance and generalization performance. Furthermore, the proposed HGRL method trained in simulation is then transferred to a real-world autonomous vehicle.
KW - au-tonomous driving
KW - human guidance
KW - Q-advantage integration
KW - reinforcement learning
KW - sim-to-real transfer
UR - http://www.scopus.com/inward/record.url?scp=85217260370&partnerID=8YFLogxK
U2 - 10.1109/CVCI63518.2024.10830150
DO - 10.1109/CVCI63518.2024.10830150
M3 - Conference contribution
AN - SCOPUS:85217260370
T3 - Proceedings of the 2024 8th CAA International Conference on Vehicular Control and Intelligence, CVCI 2024
BT - Proceedings of the 2024 8th CAA International Conference on Vehicular Control and Intelligence, CVCI 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th CAA International Conference on Vehicular Control and Intelligence, CVCI 2024
Y2 - 25 October 2024 through 27 October 2024
ER -