RAG-SafeAdapt: a multimodal retrieval-augmented model for safety and interpretability in autonomous driving

  • Mingyi Li
  • , Jiayin Li
  • , Hui Liu*
  • , Lijin Han
  • , Haitao Li
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

To address transparency and trust concerns in autonomous driving systems, often criticized as black box models, we propose RAG-SafeAdapt, a retrieval-augmented vision-language model designed for complex traffic scenarios. This end-to-end framework integrates the CARLA simulator with Retrieval-Augmented Generation (RAG) and multimodal knowledge bases. The system combines visual and language inputs to generate game-theory-based safety recommendations through Responsibility-Sensitive Safety (RSS) principles and Vision-Language Models (VLMs). RAG-SafeAdapt enhances safety analysis and interpretable decision-making, with deployment compatibility for platforms such as NVIDIA Orin. Evaluation on datasets including Berkeley DeepDrive eXplanation (BDD-X) and NuScenes-QA demonstrates improved decision explainability and generalization across diverse driving scenarios. Experimental results show superior zero-shot generalization capabilities, enhanced transparency, and a reduced collision rate of 0.35%. The framework effectively addresses key challenges in navigation clarity, sensor precision, and adaptability while fostering trust in autonomous driving through optimized real-time safety and human-readable explanations (The bold formatting in the text is used to highlight proper nouns and key terms for emphasis).

Original languageEnglish
Pages (from-to)36-51
Number of pages16
JournalProceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering
Volume240
Issue number1
DOIs
Publication statusPublished - Jan 2026
Externally publishedYes

Keywords

  • Retrieval-augmented generation
  • autonomous driving
  • explainability
  • responsibility-sensitive safety
  • sensor fusion
  • vision-language models

Fingerprint

Dive into the research topics of 'RAG-SafeAdapt: a multimodal retrieval-augmented model for safety and interpretability in autonomous driving'. Together they form a unique fingerprint.

Cite this