ORSU: An Online Road Scene Understanding Framework with Multi-level Information Fusion and Bird’s-Eye View Representation

Research output: Contribution to journalArticlepeer-review

Abstract

Autonomous vehicles require a comprehensive understanding of the surrounding road scene for navigation. Existing methods typically acquire local road maps from offline high-definition (HD) maps and perceive 3D objects from multi-modal sensor data. However, offline HD maps have limited coverage, restricting the flexibility of autonomous vehicles. Moreover, effectively integrating complementary information from multi-modal sensors remains a challenge, which impacts the performance of road scene understanding. To address these challenges, this paper proposes an online road scene understanding framework for regions lacking offline HD maps. This framework takes multi-modal sensor data as input and achieves accurate and comprehensive scene understanding by exploring multi-task learning and multi-level information fusion strategies. Specifically, multi-task learning is used to jointly perceive 3D object and instance-level map elements within a unified framework, achieving a comprehensive understanding of road scenes. To further enhance scene understanding, multi-level information fusion strategies—including camera-to-bev transformation (CBT), region associative feature decoration (RAFD), and multi-modal feature adaptive fusion (MFAF) modules—are employed to effectively integrate complementary information from multi-modal sensors. The CBT module implements camera view transformation by explicitly considering the depth information of lidar points, the RAFD module achieves data-level fusion by region-level feature decoration, and the MFAF module performs BEV-level fusion by the attention mechanism. Experimental results on the nuScenes dataset demonstrate the effectiveness of the proposed framework and highlight the superiority of the multi-level fusion strategies in improving road scene understanding.

Original languageEnglish
JournalAutomotive Innovation
DOIs
Publication statusAccepted/In press - 2026
Externally publishedYes

Keywords

  • 3D object detection
  • Multi-modal fusion
  • Road map estimation
  • Road scene understanding

Fingerprint

Dive into the research topics of 'ORSU: An Online Road Scene Understanding Framework with Multi-level Information Fusion and Bird’s-Eye View Representation'. Together they form a unique fingerprint.

Cite this