BiVIBEV: Bimodal Vehicle-Infrastructure Cooperative Perception via Unified BEV Representation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Infrastructure sensors provide broader perception ranges and fewer occlusions, making vehicle-infrastructure cooperative perception increasingly vital. Existing methods mostly adopt LiDAR point clouds from both vehicles and infrastructure to generate Bird's Eye View (BEV) representations, followed by adaptive fusion. Despite their effectiveness, these LiDAR-based approaches fundamentally neglect the semantic richness and textural details inherent in camera information, which limits the deployment capability of vehicle-infrastructure cooperative systems in practical scenarios. For vehicle-infrastructure cooperative 3D object detection, BiVIBEV is proposed, a novel framework. This work designs the Vehicle-Infrastructure Spatial Cross-Attention (VI-SCA) module to acquire unified image BEV features for subsequent fusion. This module utilizes LiDAR BEV representations from both vehicles and infrastructure as geometric priors, engaging in iterative interactions with the corresponding vehicle-view and infrastructure-view images. Moreover, we incorporate an occupancy supervision branch designed to provide denser geometric-level supervision for BEV features. Experimental results on the DAIR-V2X-C dataset demonstrate that BiVIBEV surpasses existing methods and achieves state-of-the-art (SOTA) performance.

Original languageEnglish
Title of host publicationIntelligent Transportation Engineering - Proceedings of the 10th International Conference on Intelligent Transportation Engineering, ICITE 2025
EditorsYanyan Chen
PublisherIOS Press BV
Pages411-422
Number of pages12
ISBN (Electronic)9781643686400
DOIs
Publication statusPublished - 8 Jan 2026
Event10th International Conference on Intelligent Transportation Engineering, ICITE 2025 - Beijing, China
Duration: 24 Oct 202526 Oct 2025

Publication series

NameAdvances in Transdisciplinary Engineering
Volume84
ISSN (Print)2352-751X
ISSN (Electronic)2352-7528

Conference

Conference10th International Conference on Intelligent Transportation Engineering, ICITE 2025
Country/TerritoryChina
CityBeijing
Period24/10/2526/10/25

Keywords

  • 3D Object Detection
  • BEV representations
  • Cooperative Perception
  • LiDAR and Camera

Fingerprint

Dive into the research topics of 'BiVIBEV: Bimodal Vehicle-Infrastructure Cooperative Perception via Unified BEV Representation'. Together they form a unique fingerprint.

Cite this