Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Wenming Zhu, Jia Zhou, Zizhe Wang, Xuehua Zhou*, Feng Zhou, Jingwen Sun, Mingrui Song, Zhiguo Zhou

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

Original languageEnglish
Article number3512
JournalElectronics (Switzerland)
Volume13
Issue number17
DOIs
Publication statusPublished - Sept 2024

Keywords

  • 3D object detection
  • auto-driving
  • multi-sensor fusion
  • self-attention mechanism

Fingerprint

Dive into the research topics of 'Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion'. Together they form a unique fingerprint.

Cite this