Skip to main navigation Skip to search Skip to main content

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

  • Shihao Wang
  • , Zhiding Yu*
  • , Xiaohui Jiang
  • , Shiyi Lan
  • , Min Shi
  • , Nadine Chang
  • , Jan Kautz
  • , Ying Li
  • , Jose M. Alvarez
  • *Corresponding author for this work
  • NVIDIA
  • Hong Kong Polytechnic University
  • Beijing Institute of Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

The advances in vision-language models (VLMs) have led to a growing interest in autonomous driving to leverage their strong reasoning capabilities. However, extending these capabilities from 2D to full 3D understanding is crucial for real-world applications. To address this challenge, we propose OmniDrive, a holistic vision-language dataset that aligns agent models with 3D driving tasks through counter-factual reasoning. This approach enhances decision-making by evaluating potential scenarios and their outcomes, similar to human drivers considering alternative actions. Our counterfactual-based synthetic data annotation process generates large-scale, high-quality datasets, providing denser supervision signals that bridge planning trajectories and language-based reasoning. Futher, we explore two advanced OmniDrive-Agent frameworks, namely Omni-L and Omni-Q, to assess the importance of vision-language alignment versus 3D perception, revealing critical insights into designing effective LLM-agents. Significant improvements on the DriveLM Q&A benchmark and nuScenes open-loop planning demonstrate the effectiveness of our dataset and methods.

Original languageEnglish
Pages (from-to)22442-22452
Number of pages11
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
Publication statusPublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States
Duration: 11 Jun 202515 Jun 2025

Keywords

  • vlm; autonomous driving; dataset

Fingerprint

Dive into the research topics of 'OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning'. Together they form a unique fingerprint.

Cite this