TY - JOUR
T1 - 3DMMF
T2 - 2024 4th International Conference on Artificial Intelligence and Industrial Technology Applications, AIITA 2024
AU - Zhou, Jia
AU - Xu, Limei
AU - Ma, Wenhao
AU - Zhou, Zhiguo
AU - Zhou, Xuehua
AU - Shi, Yonggang
N1 - Publisher Copyright:
© Published under licence by IOP Publishing Ltd.
PY - 2024
Y1 - 2024
N2 - Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most of the LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons to the misalignment of views and difficulty in matching heterogeneous features. In particular, the single-stage fusion method makes it difficult to fully fuse the features of the image and point cloud. In this work, we propose 3DMMF: a 3D object detection network based on multi-layer and multi-modal fusion methods. 3DMMF works by painting and encoding point clouds in the frustum proposed by the 2D object detection network. Then the painted point cloud is fed to the LiDAR-only object detection network which has expanded channels and a self-attention mechanism module. Finally, CLOCs are used to match the geometric direction features and category semantic features of 2D and 3D detection results. Experiments on KITTI datasets show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.
AB - Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most of the LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons to the misalignment of views and difficulty in matching heterogeneous features. In particular, the single-stage fusion method makes it difficult to fully fuse the features of the image and point cloud. In this work, we propose 3DMMF: a 3D object detection network based on multi-layer and multi-modal fusion methods. 3DMMF works by painting and encoding point clouds in the frustum proposed by the 2D object detection network. Then the painted point cloud is fed to the LiDAR-only object detection network which has expanded channels and a self-attention mechanism module. Finally, CLOCs are used to match the geometric direction features and category semantic features of 2D and 3D detection results. Experiments on KITTI datasets show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.
UR - http://www.scopus.com/inward/record.url?scp=85201735168&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/2816/1/012027
DO - 10.1088/1742-6596/2816/1/012027
M3 - Conference article
AN - SCOPUS:85201735168
SN - 1742-6588
VL - 2816
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012027
Y2 - 12 April 2024 through 14 April 2024
ER -