Large language model guided deep reinforcement learning for safe autonomous vehicle decision making

  • Hao Pang
  • , Zhenpo Wang
  • , Guoqiang Li*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Deep reinforcement learning (DRL) has shown promising potential for decision-making in autonomous driving. However, it requires extensive interaction with the environment and generally has low learning efficiency. To address these challenges, this paper proposes a novel large language model (LLM) guided deep reinforcement learning (LGDRL) framework for the decision-making problem in autonomous driving. Leveraging the powerful reasoning capabilities of LLMs, an LLM-based driving expert is designed to provide intelligent guidance in the DRL learning process. Subsequently, an innovative expert policy constrained algorithm and a novel LLM-intervened interaction mechanism are developed to efficiently integrate the guidance from the LLM expert to enhance the performance of DRL decision-making policies. Extensive experiments are conducted to evaluate the performance of the proposed LGDRL method. The results demonstrate that our proposed method effectively leverages expert guidance to enhance both learning efficiency and performance of DRL, achieving superior driving performance. Moreover, it enables the DRL agent to maintain consistent and reliable performance in the absence of LLM expert guidance, which is promising for real-world applications. The supplementary videos are available athttps://bitmobility.github.io/LGDRL/.

Original languageEnglish
Article number105511
JournalTransportation Research Part C: Emerging Technologies
Volume184
DOIs
Publication statusPublished - Mar 2026
Externally publishedYes

Keywords

  • Autonomous vehicle
  • Decision-making
  • Deep reinforcement learning
  • Large language models

Fingerprint

Dive into the research topics of 'Large language model guided deep reinforcement learning for safe autonomous vehicle decision making'. Together they form a unique fingerprint.

Cite this