3DMMF: 3D object detection network based on multi-layer and multi-modal fusion

Jia Zhou; Limei Xu; Wenhao Ma; Zhiguo Zhou; Xuehua Zhou; Yonggang Shi

doi:10.1088/1742-6596/2816/1/012027

3DMMF: 3D object detection network based on multi-layer and multi-modal fusion

Jia Zhou, Limei Xu, Wenhao Ma, Zhiguo Zhou^*, Xuehua Zhou, Yonggang Shi

^*此作品的通讯作者

集成电路与电子学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most of the LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons to the misalignment of views and difficulty in matching heterogeneous features. In particular, the single-stage fusion method makes it difficult to fully fuse the features of the image and point cloud. In this work, we propose 3DMMF: a 3D object detection network based on multi-layer and multi-modal fusion methods. 3DMMF works by painting and encoding point clouds in the frustum proposed by the 2D object detection network. Then the painted point cloud is fed to the LiDAR-only object detection network which has expanded channels and a self-attention mechanism module. Finally, CLOCs are used to match the geometric direction features and category semantic features of 2D and 3D detection results. Experiments on KITTI datasets show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

源语言	英语
文章编号	012027
期刊	Journal of Physics: Conference Series
卷	2816
期	1
DOI	https://doi.org/10.1088/1742-6596/2816/1/012027
出版状态	已出版 - 2024
活动	2024 4th International Conference on Artificial Intelligence and Industrial Technology Applications, AIITA 2024 - Hybrid, Guangzhou, 中国期限: 12 4月 2024 → 14 4月 2024

访问文件

10.1088/1742-6596/2816/1/012027

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9fd44370187e45399c61b867ed2a361d,

title = "3DMMF: 3D object detection network based on multi-layer and multi-modal fusion",

abstract = "Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most of the LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons to the misalignment of views and difficulty in matching heterogeneous features. In particular, the single-stage fusion method makes it difficult to fully fuse the features of the image and point cloud. In this work, we propose 3DMMF: a 3D object detection network based on multi-layer and multi-modal fusion methods. 3DMMF works by painting and encoding point clouds in the frustum proposed by the 2D object detection network. Then the painted point cloud is fed to the LiDAR-only object detection network which has expanded channels and a self-attention mechanism module. Finally, CLOCs are used to match the geometric direction features and category semantic features of 2D and 3D detection results. Experiments on KITTI datasets show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.",

author = "Jia Zhou and Limei Xu and Wenhao Ma and Zhiguo Zhou and Xuehua Zhou and Yonggang Shi",

note = "Publisher Copyright: {\textcopyright} Published under licence by IOP Publishing Ltd.; 2024 4th International Conference on Artificial Intelligence and Industrial Technology Applications, AIITA 2024 ; Conference date: 12-04-2024 Through 14-04-2024",

year = "2024",

doi = "10.1088/1742-6596/2816/1/012027",

language = "English",

volume = "2816",

journal = "Journal of Physics: Conference Series",

issn = "1742-6588",

publisher = "IOP Publishing Ltd.",

number = "1",

}

TY - JOUR

T1 - 3DMMF

T2 - 2024 4th International Conference on Artificial Intelligence and Industrial Technology Applications, AIITA 2024

AU - Zhou, Jia

AU - Xu, Limei

AU - Ma, Wenhao

AU - Zhou, Zhiguo

AU - Zhou, Xuehua

AU - Shi, Yonggang

PY - 2024

Y1 - 2024

N2 - Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most of the LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons to the misalignment of views and difficulty in matching heterogeneous features. In particular, the single-stage fusion method makes it difficult to fully fuse the features of the image and point cloud. In this work, we propose 3DMMF: a 3D object detection network based on multi-layer and multi-modal fusion methods. 3DMMF works by painting and encoding point clouds in the frustum proposed by the 2D object detection network. Then the painted point cloud is fed to the LiDAR-only object detection network which has expanded channels and a self-attention mechanism module. Finally, CLOCs are used to match the geometric direction features and category semantic features of 2D and 3D detection results. Experiments on KITTI datasets show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

AB - Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most of the LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons to the misalignment of views and difficulty in matching heterogeneous features. In particular, the single-stage fusion method makes it difficult to fully fuse the features of the image and point cloud. In this work, we propose 3DMMF: a 3D object detection network based on multi-layer and multi-modal fusion methods. 3DMMF works by painting and encoding point clouds in the frustum proposed by the 2D object detection network. Then the painted point cloud is fed to the LiDAR-only object detection network which has expanded channels and a self-attention mechanism module. Finally, CLOCs are used to match the geometric direction features and category semantic features of 2D and 3D detection results. Experiments on KITTI datasets show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

UR - http://www.scopus.com/inward/record.url?scp=85201735168&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/2816/1/012027

DO - 10.1088/1742-6596/2816/1/012027

M3 - Conference article

AN - SCOPUS:85201735168

SN - 1742-6588

VL - 2816

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

IS - 1

M1 - 012027

Y2 - 12 April 2024 through 14 April 2024

ER -

3DMMF: 3D object detection network based on multi-layer and multi-modal fusion

摘要

访问文件

其它文件与链接

指纹

引用此