Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network

Yingjuan Tang, Hongwen He*, Yong Wang, Jingda Wu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi-modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.

Original languageEnglish
Article number124945
JournalExpert Systems with Applications
Volume256
DOIs
Publication statusPublished - 5 Dec 2024

Keywords

  • 3D object detection
  • Autonomous driving
  • Homogeneous fusion
  • Multi-modal
  • Point cloud and image fusion
  • Sparse convolutional networks

Fingerprint

Dive into the research topics of 'Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network'. Together they form a unique fingerprint.

Cite this