TY - JOUR
T1 - ORSU
T2 - An Online Road Scene Understanding Framework with Multi-level Information Fusion and Bird’s-Eye View Representation
AU - Jia, Peng
AU - Gong, Jianwei
AU - Ju, Zhiyang
N1 - Publisher Copyright:
© China Society of Automotive Engineers (China SAE) 2026.
PY - 2026
Y1 - 2026
N2 - Autonomous vehicles require a comprehensive understanding of the surrounding road scene for navigation. Existing methods typically acquire local road maps from offline high-definition (HD) maps and perceive 3D objects from multi-modal sensor data. However, offline HD maps have limited coverage, restricting the flexibility of autonomous vehicles. Moreover, effectively integrating complementary information from multi-modal sensors remains a challenge, which impacts the performance of road scene understanding. To address these challenges, this paper proposes an online road scene understanding framework for regions lacking offline HD maps. This framework takes multi-modal sensor data as input and achieves accurate and comprehensive scene understanding by exploring multi-task learning and multi-level information fusion strategies. Specifically, multi-task learning is used to jointly perceive 3D object and instance-level map elements within a unified framework, achieving a comprehensive understanding of road scenes. To further enhance scene understanding, multi-level information fusion strategies—including camera-to-bev transformation (CBT), region associative feature decoration (RAFD), and multi-modal feature adaptive fusion (MFAF) modules—are employed to effectively integrate complementary information from multi-modal sensors. The CBT module implements camera view transformation by explicitly considering the depth information of lidar points, the RAFD module achieves data-level fusion by region-level feature decoration, and the MFAF module performs BEV-level fusion by the attention mechanism. Experimental results on the nuScenes dataset demonstrate the effectiveness of the proposed framework and highlight the superiority of the multi-level fusion strategies in improving road scene understanding.
AB - Autonomous vehicles require a comprehensive understanding of the surrounding road scene for navigation. Existing methods typically acquire local road maps from offline high-definition (HD) maps and perceive 3D objects from multi-modal sensor data. However, offline HD maps have limited coverage, restricting the flexibility of autonomous vehicles. Moreover, effectively integrating complementary information from multi-modal sensors remains a challenge, which impacts the performance of road scene understanding. To address these challenges, this paper proposes an online road scene understanding framework for regions lacking offline HD maps. This framework takes multi-modal sensor data as input and achieves accurate and comprehensive scene understanding by exploring multi-task learning and multi-level information fusion strategies. Specifically, multi-task learning is used to jointly perceive 3D object and instance-level map elements within a unified framework, achieving a comprehensive understanding of road scenes. To further enhance scene understanding, multi-level information fusion strategies—including camera-to-bev transformation (CBT), region associative feature decoration (RAFD), and multi-modal feature adaptive fusion (MFAF) modules—are employed to effectively integrate complementary information from multi-modal sensors. The CBT module implements camera view transformation by explicitly considering the depth information of lidar points, the RAFD module achieves data-level fusion by region-level feature decoration, and the MFAF module performs BEV-level fusion by the attention mechanism. Experimental results on the nuScenes dataset demonstrate the effectiveness of the proposed framework and highlight the superiority of the multi-level fusion strategies in improving road scene understanding.
KW - 3D object detection
KW - Multi-modal fusion
KW - Road map estimation
KW - Road scene understanding
UR - https://www.scopus.com/pages/publications/105028226698
U2 - 10.1007/s42154-025-00387-3
DO - 10.1007/s42154-025-00387-3
M3 - Article
AN - SCOPUS:105028226698
SN - 2096-4250
JO - Automotive Innovation
JF - Automotive Innovation
ER -