Improving policy training for autonomous driving through randomized ensembled double Q-learning with Transformer encoder feature evaluation

Jie Fan, Xudong Zhang*, Yuan Zou, Yuanyuan Li, Yingqun Liu, Wenjing Sun

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

In the burgeoning field of autonomous driving, reinforcement learning (RL) has gained prominence for its adaptability and intelligent decision-making. However, conventional RL methods face challenges in efficiently extracting relevant features from high-dimensional inputs and maximizing the use of environment-agent interaction data. To surmount these obstacles, this paper introduces a novel RL-based approach that integrates randomized ensembled double Q-Learning (REDQ) with a Transformer encoder. The Transformer encoder's attention mechanism is utilized to dynamically evaluate features according to their relevance in different driving scenarios. Simultaneously, the implementation of REDQ, characterized by a high update-to-data (UTD) ratio, enhances the utilization of interaction data during policy training. Especially, the incorporation of ensemble strategy and in-target minimization in REDQ significantly improves training stability, especially under high UTD conditions. Ablation studies indicate that the Transformer encoder exhibits significantly enhanced feature extraction capabilities compared to conventional network architectures, resulting in a 13.6% to 21.4% increase in success rate for the MetaDrive autonomous driving task. Additionally, when compared to standard RL methodologies, the proposed approach demonstrates a faster rate of reward acquisition and achieves a 67.5% to 69% improvement in success rate.

Original languageEnglish
Article number112386
JournalApplied Soft Computing
Volume167
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Autonomous driving
  • Reinforcement learning
  • Transformer encoder

Fingerprint

Dive into the research topics of 'Improving policy training for autonomous driving through randomized ensembled double Q-learning with Transformer encoder feature evaluation'. Together they form a unique fingerprint.

Cite this