PolarGFusion3D: Polar Graph Fusion Network for Enhanced Multimodal 3D Perception in Intelligent Vehicles

  • Luxing Li
  • , Chao Wei*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal fusion technology significantly enhances the safety and perception capabilities of intelligent vehicles. Recently, replacing Cartesian coordinate system voxels with polar voxels in 3D perception tasks has significantly improved spatial occupancy rates and adaptability. However, the uneven distribution of voxels introduces new challenges: feature information distortion and reduced real-time performance. This paper proposes a multimodal fusion network based on polar graphs to address these issues. Raw data from LiDAR, cameras, and millimeter-wave (MMW) radar are initially preprocessed, and point-graph and voxel-graph structures in polar coordinates are constructed. Subsequently, using Graph Attention Networks (GAT), features are extracted and aggregated at multiple levels, forming a polar-based Bird's Eye View (BEV) feature map. At the BEV level, multimodal features are fused, and multi-scale features are aggregated using multi-scale GAT, culminating in the design of a polar-based CenterHead to complete the 3D perception task. Extensive experiments conducted on the nuScenes dataset and real vehicle test data have demonstrated that the detection precision (70.5% mAP) and inference speed (12.6 Hz) of the model's surpass those of comparative models, establishing a new state-of-the-art (SOTA). Additionally, the model exhibits high levels of perception accuracy, robustness, and generalizability across various real vehicle scenarios.

Original languageEnglish
Pages (from-to)36-47
Number of pages12
JournalIEEE Transactions on Intelligent Vehicles
Volume10
Issue number1
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • 3D perception
  • Multimodal fusion
  • graph attention networks
  • intelligent vehicles

Fingerprint

Dive into the research topics of 'PolarGFusion3D: Polar Graph Fusion Network for Enhanced Multimodal 3D Perception in Intelligent Vehicles'. Together they form a unique fingerprint.

Cite this