TY - JOUR
T1 - Multi-Modal 3D Object Detection in Autonomous Driving
T2 - A Survey and Taxonomy
AU - Wang, Li
AU - Zhang, Xinyu
AU - Song, Ziying
AU - Bi, Jiangfeng
AU - Zhang, Guoxin
AU - Wei, Haiyue
AU - Tang, Liyao
AU - Yang, Lei
AU - Li, Jun
AU - Jia, Caiyan
AU - Zhao, Lijun
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Autonomous vehicles require constant environmental perception to obtain the distribution of obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional module as it can simultaneously predict surrounding objects' categories, locations, and sizes. Generally, autonomous vehicles are equipped with multiple sensors, including cameras and LiDARs. The fact that single-modal methods suffer from unsatisfactory detection performance motivates utilizing multiple modalities as inputs to compensate for single sensor faults. Although many multi-modal fusion detection algorithms exist, there is still a lack of comprehensive and in-depth analysis of these methods to clarify how to fuse multi-modal data effectively. Therefore, this paper surveys recent advancements in fusion detection methods. First, we present the broad background of multi-modal 3D object detection and identify the characteristics of widely used datasets along with their evaluation metrics. Second, instead of the traditional classification method of early, middle, and late fusion, we categorize and analyze all fusion methods from three aspects: feature representation, alignment, and fusion, which reveals how these fusion methods are implemented in an essential way. Third, we provide an in-depth comparison of their pros and cons and compare their performance in mainstream datasets. Finally, we further summarize current challenges and research trends for realizing the full potential of multi-modal 3D object detection.
AB - Autonomous vehicles require constant environmental perception to obtain the distribution of obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional module as it can simultaneously predict surrounding objects' categories, locations, and sizes. Generally, autonomous vehicles are equipped with multiple sensors, including cameras and LiDARs. The fact that single-modal methods suffer from unsatisfactory detection performance motivates utilizing multiple modalities as inputs to compensate for single sensor faults. Although many multi-modal fusion detection algorithms exist, there is still a lack of comprehensive and in-depth analysis of these methods to clarify how to fuse multi-modal data effectively. Therefore, this paper surveys recent advancements in fusion detection methods. First, we present the broad background of multi-modal 3D object detection and identify the characteristics of widely used datasets along with their evaluation metrics. Second, instead of the traditional classification method of early, middle, and late fusion, we categorize and analyze all fusion methods from three aspects: feature representation, alignment, and fusion, which reveals how these fusion methods are implemented in an essential way. Third, we provide an in-depth comparison of their pros and cons and compare their performance in mainstream datasets. Finally, we further summarize current challenges and research trends for realizing the full potential of multi-modal 3D object detection.
KW - 3D object detection
KW - Autonomous driving
KW - multi-modal fusion
UR - http://www.scopus.com/inward/record.url?scp=85153389589&partnerID=8YFLogxK
U2 - 10.1109/TIV.2023.3264658
DO - 10.1109/TIV.2023.3264658
M3 - Article
AN - SCOPUS:85153389589
SN - 2379-8858
VL - 8
SP - 3781
EP - 3798
JO - IEEE Transactions on Intelligent Vehicles
JF - IEEE Transactions on Intelligent Vehicles
IS - 7
ER -