TY - JOUR
T1 - Improving policy training for autonomous driving through randomized ensembled double Q-learning with Transformer encoder feature evaluation
AU - Fan, Jie
AU - Zhang, Xudong
AU - Zou, Yuan
AU - Li, Yuanyuan
AU - Liu, Yingqun
AU - Sun, Wenjing
N1 - Publisher Copyright:
© 2024
PY - 2024/12
Y1 - 2024/12
N2 - In the burgeoning field of autonomous driving, reinforcement learning (RL) has gained prominence for its adaptability and intelligent decision-making. However, conventional RL methods face challenges in efficiently extracting relevant features from high-dimensional inputs and maximizing the use of environment-agent interaction data. To surmount these obstacles, this paper introduces a novel RL-based approach that integrates randomized ensembled double Q-Learning (REDQ) with a Transformer encoder. The Transformer encoder's attention mechanism is utilized to dynamically evaluate features according to their relevance in different driving scenarios. Simultaneously, the implementation of REDQ, characterized by a high update-to-data (UTD) ratio, enhances the utilization of interaction data during policy training. Especially, the incorporation of ensemble strategy and in-target minimization in REDQ significantly improves training stability, especially under high UTD conditions. Ablation studies indicate that the Transformer encoder exhibits significantly enhanced feature extraction capabilities compared to conventional network architectures, resulting in a 13.6% to 21.4% increase in success rate for the MetaDrive autonomous driving task. Additionally, when compared to standard RL methodologies, the proposed approach demonstrates a faster rate of reward acquisition and achieves a 67.5% to 69% improvement in success rate.
AB - In the burgeoning field of autonomous driving, reinforcement learning (RL) has gained prominence for its adaptability and intelligent decision-making. However, conventional RL methods face challenges in efficiently extracting relevant features from high-dimensional inputs and maximizing the use of environment-agent interaction data. To surmount these obstacles, this paper introduces a novel RL-based approach that integrates randomized ensembled double Q-Learning (REDQ) with a Transformer encoder. The Transformer encoder's attention mechanism is utilized to dynamically evaluate features according to their relevance in different driving scenarios. Simultaneously, the implementation of REDQ, characterized by a high update-to-data (UTD) ratio, enhances the utilization of interaction data during policy training. Especially, the incorporation of ensemble strategy and in-target minimization in REDQ significantly improves training stability, especially under high UTD conditions. Ablation studies indicate that the Transformer encoder exhibits significantly enhanced feature extraction capabilities compared to conventional network architectures, resulting in a 13.6% to 21.4% increase in success rate for the MetaDrive autonomous driving task. Additionally, when compared to standard RL methodologies, the proposed approach demonstrates a faster rate of reward acquisition and achieves a 67.5% to 69% improvement in success rate.
KW - Autonomous driving
KW - Reinforcement learning
KW - Transformer encoder
UR - http://www.scopus.com/inward/record.url?scp=85208069520&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2024.112386
DO - 10.1016/j.asoc.2024.112386
M3 - Article
AN - SCOPUS:85208069520
SN - 1568-4946
VL - 167
JO - Applied Soft Computing
JF - Applied Soft Computing
M1 - 112386
ER -