TY - JOUR
T1 - Towards efficient multi-modal 3D object detection
T2 - Homogeneous sparse fuse network
AU - Tang, Yingjuan
AU - He, Hongwen
AU - Wang, Yong
AU - Wu, Jingda
N1 - Publisher Copyright:
© 2024
PY - 2024/12/5
Y1 - 2024/12/5
N2 - LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi-modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.
AB - LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi-modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.
KW - 3D object detection
KW - Autonomous driving
KW - Homogeneous fusion
KW - Multi-modal
KW - Point cloud and image fusion
KW - Sparse convolutional networks
UR - http://www.scopus.com/inward/record.url?scp=85200534785&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.124945
DO - 10.1016/j.eswa.2024.124945
M3 - Article
AN - SCOPUS:85200534785
SN - 0957-4174
VL - 256
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 124945
ER -