Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Wenming Zhu; Jia Zhou; Zizhe Wang; Xuehua Zhou; Feng Zhou; Jingwen Sun; Mingrui Song; Zhiguo Zhou

doi:10.3390/electronics13173512

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Wenming Zhu, Jia Zhou, Zizhe Wang, Xuehua Zhou^*, Feng Zhou, Jingwen Sun, Mingrui Song, Zhiguo Zhou

^*Corresponding author for this work

School of Integrated Circuits and Electronics

Research output: Contribution to journal › Article › peer-review

Abstract

Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

Original language	English
Article number	3512
Journal	Electronics (Switzerland)
Volume	13
Issue number	17
DOIs	https://doi.org/10.3390/electronics13173512
Publication status	Published - Sept 2024

Keywords

3D object detection
auto-driving
multi-sensor fusion
self-attention mechanism

Access to Document

10.3390/electronics13173512

Cite this

@article{df614c652bd8415697b160eb1c75467a,

title = "Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion",

abstract = "Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.",

keywords = "3D object detection, auto-driving, multi-sensor fusion, self-attention mechanism",

author = "Wenming Zhu and Jia Zhou and Zizhe Wang and Xuehua Zhou and Feng Zhou and Jingwen Sun and Mingrui Song and Zhiguo Zhou",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = sep,

doi = "10.3390/electronics13173512",

language = "English",

volume = "13",

journal = "Electronics (Switzerland)",

issn = "2079-9292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "17",

}

TY - JOUR

T1 - Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

AU - Zhu, Wenming

AU - Zhou, Jia

AU - Wang, Zizhe

AU - Zhou, Xuehua

AU - Zhou, Feng

AU - Sun, Jingwen

AU - Song, Mingrui

AU - Zhou, Zhiguo

PY - 2024/9

Y1 - 2024/9

N2 - Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

AB - Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusion method on the main benchmark datasets. Current studies attribute the reasons for this to misalignment of views and difficulty in matching heterogeneous features. Specially, using the single-stage fusion method, it is difficult to fully fuse the features of the image and point cloud. In this work, we propose a 3D object detection network based on the multi-layer and multi-modal fusion (3DMMF) method. 3DMMF works by painting and encoding the point cloud in the frustum proposed by the 2D object detection network. Then, the painted point cloud is fed to the LiDAR-only object detection network, which has expanded channels and a self-attention mechanism module. Finally, the camera-LiDAR object candidates fusion for 3D object detection(CLOCs) method is used to match the geometric direction features and category semantic features of the 2D and 3D detection results. Experiments on the KITTI dataset (a public dataset) show that this fusion method has a significant improvement over the baseline of the LiDAR-only method, with an average mAP improvement of 6.3%.

KW - 3D object detection

KW - auto-driving

KW - multi-sensor fusion

KW - self-attention mechanism

UR - http://www.scopus.com/inward/record.url?scp=85203659210&partnerID=8YFLogxK

U2 - 10.3390/electronics13173512

DO - 10.3390/electronics13173512

M3 - Article

AN - SCOPUS:85203659210

SN - 2079-9292

VL - 13

JO - Electronics (Switzerland)

JF - Electronics (Switzerland)

IS - 17

M1 - 3512

ER -

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this