TY - GEN
T1 - DriveDreamer
T2 - 18th European Conference on Computer Vision, ECCV 2024
AU - Wang, Xiaofeng
AU - Zhu, Zheng
AU - Huang, Guan
AU - Chen, Xinze
AU - Zhu, Jiagang
AU - Lu, Jiwen
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. Extensive experiments are conducted to verify that DriveDreamer empowers both driving video generation and action prediction, faithfully capturing real-world traffic constraints. Furthermore, videos generated by DriveDreamer significantly enhance the training of driving perception methods.
AB - World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. Extensive experiments are conducted to verify that DriveDreamer empowers both driving video generation and action prediction, faithfully capturing real-world traffic constraints. Furthermore, videos generated by DriveDreamer significantly enhance the training of driving perception methods.
KW - Autonomous driving
KW - Video generation
KW - World models
UR - http://www.scopus.com/inward/record.url?scp=85210802424&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-73195-2_4
DO - 10.1007/978-3-031-73195-2_4
M3 - Conference contribution
AN - SCOPUS:85210802424
SN - 9783031731945
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 55
EP - 72
BT - Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
A2 - Leonardis, Aleš
A2 - Ricci, Elisa
A2 - Roth, Stefan
A2 - Russakovsky, Olga
A2 - Sattler, Torsten
A2 - Varol, Gül
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 29 September 2024 through 4 October 2024
ER -