Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy

Li Wang; Xinyu Zhang; Ziying Song; Jiangfeng Bi; Guoxin Zhang; Haiyue Wei; Liyao Tang; Lei Yang; Jun Li; Caiyan Jia; Lijun Zhao

doi:10.1109/TIV.2023.3264658

Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy

Li Wang, Xinyu Zhang^*, Ziying Song, Jiangfeng Bi, Guoxin Zhang, Haiyue Wei, Liyao Tang, Lei Yang, Jun Li, Caiyan Jia, Lijun Zhao

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

86 Citations (Scopus)

Abstract

Autonomous vehicles require constant environmental perception to obtain the distribution of obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional module as it can simultaneously predict surrounding objects' categories, locations, and sizes. Generally, autonomous vehicles are equipped with multiple sensors, including cameras and LiDARs. The fact that single-modal methods suffer from unsatisfactory detection performance motivates utilizing multiple modalities as inputs to compensate for single sensor faults. Although many multi-modal fusion detection algorithms exist, there is still a lack of comprehensive and in-depth analysis of these methods to clarify how to fuse multi-modal data effectively. Therefore, this paper surveys recent advancements in fusion detection methods. First, we present the broad background of multi-modal 3D object detection and identify the characteristics of widely used datasets along with their evaluation metrics. Second, instead of the traditional classification method of early, middle, and late fusion, we categorize and analyze all fusion methods from three aspects: feature representation, alignment, and fusion, which reveals how these fusion methods are implemented in an essential way. Third, we provide an in-depth comparison of their pros and cons and compare their performance in mainstream datasets. Finally, we further summarize current challenges and research trends for realizing the full potential of multi-modal 3D object detection.

Original language	English
Pages (from-to)	3781-3798
Number of pages	18
Journal	IEEE Transactions on Intelligent Vehicles
Volume	8
Issue number	7
DOIs	https://doi.org/10.1109/TIV.2023.3264658
Publication status	Published - 1 Jul 2023
Externally published	Yes

Keywords

3D object detection
Autonomous driving
multi-modal fusion

Access to Document

10.1109/TIV.2023.3264658

Cite this

@article{372761af4a4a47029b37e5a8494392e8,

title = "Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy",

abstract = "Autonomous vehicles require constant environmental perception to obtain the distribution of obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional module as it can simultaneously predict surrounding objects' categories, locations, and sizes. Generally, autonomous vehicles are equipped with multiple sensors, including cameras and LiDARs. The fact that single-modal methods suffer from unsatisfactory detection performance motivates utilizing multiple modalities as inputs to compensate for single sensor faults. Although many multi-modal fusion detection algorithms exist, there is still a lack of comprehensive and in-depth analysis of these methods to clarify how to fuse multi-modal data effectively. Therefore, this paper surveys recent advancements in fusion detection methods. First, we present the broad background of multi-modal 3D object detection and identify the characteristics of widely used datasets along with their evaluation metrics. Second, instead of the traditional classification method of early, middle, and late fusion, we categorize and analyze all fusion methods from three aspects: feature representation, alignment, and fusion, which reveals how these fusion methods are implemented in an essential way. Third, we provide an in-depth comparison of their pros and cons and compare their performance in mainstream datasets. Finally, we further summarize current challenges and research trends for realizing the full potential of multi-modal 3D object detection.",

keywords = "3D object detection, Autonomous driving, multi-modal fusion",

author = "Li Wang and Xinyu Zhang and Ziying Song and Jiangfeng Bi and Guoxin Zhang and Haiyue Wei and Liyao Tang and Lei Yang and Jun Li and Caiyan Jia and Lijun Zhao",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2023",

month = jul,

day = "1",

doi = "10.1109/TIV.2023.3264658",

language = "English",

volume = "8",

pages = "3781--3798",

journal = "IEEE Transactions on Intelligent Vehicles",

issn = "2379-8858",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "7",

}

TY - JOUR

T1 - Multi-Modal 3D Object Detection in Autonomous Driving

T2 - A Survey and Taxonomy

AU - Wang, Li

AU - Zhang, Xinyu

AU - Song, Ziying

AU - Bi, Jiangfeng

AU - Zhang, Guoxin

AU - Wei, Haiyue

AU - Tang, Liyao

AU - Yang, Lei

AU - Li, Jun

AU - Jia, Caiyan

AU - Zhao, Lijun

PY - 2023/7/1

Y1 - 2023/7/1

N2 - Autonomous vehicles require constant environmental perception to obtain the distribution of obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional module as it can simultaneously predict surrounding objects' categories, locations, and sizes. Generally, autonomous vehicles are equipped with multiple sensors, including cameras and LiDARs. The fact that single-modal methods suffer from unsatisfactory detection performance motivates utilizing multiple modalities as inputs to compensate for single sensor faults. Although many multi-modal fusion detection algorithms exist, there is still a lack of comprehensive and in-depth analysis of these methods to clarify how to fuse multi-modal data effectively. Therefore, this paper surveys recent advancements in fusion detection methods. First, we present the broad background of multi-modal 3D object detection and identify the characteristics of widely used datasets along with their evaluation metrics. Second, instead of the traditional classification method of early, middle, and late fusion, we categorize and analyze all fusion methods from three aspects: feature representation, alignment, and fusion, which reveals how these fusion methods are implemented in an essential way. Third, we provide an in-depth comparison of their pros and cons and compare their performance in mainstream datasets. Finally, we further summarize current challenges and research trends for realizing the full potential of multi-modal 3D object detection.

AB - Autonomous vehicles require constant environmental perception to obtain the distribution of obstacles to achieve safe driving. Specifically, 3D object detection is a vital functional module as it can simultaneously predict surrounding objects' categories, locations, and sizes. Generally, autonomous vehicles are equipped with multiple sensors, including cameras and LiDARs. The fact that single-modal methods suffer from unsatisfactory detection performance motivates utilizing multiple modalities as inputs to compensate for single sensor faults. Although many multi-modal fusion detection algorithms exist, there is still a lack of comprehensive and in-depth analysis of these methods to clarify how to fuse multi-modal data effectively. Therefore, this paper surveys recent advancements in fusion detection methods. First, we present the broad background of multi-modal 3D object detection and identify the characteristics of widely used datasets along with their evaluation metrics. Second, instead of the traditional classification method of early, middle, and late fusion, we categorize and analyze all fusion methods from three aspects: feature representation, alignment, and fusion, which reveals how these fusion methods are implemented in an essential way. Third, we provide an in-depth comparison of their pros and cons and compare their performance in mainstream datasets. Finally, we further summarize current challenges and research trends for realizing the full potential of multi-modal 3D object detection.

KW - 3D object detection

KW - Autonomous driving

KW - multi-modal fusion

UR - http://www.scopus.com/inward/record.url?scp=85153389589&partnerID=8YFLogxK

U2 - 10.1109/TIV.2023.3264658

DO - 10.1109/TIV.2023.3264658

M3 - Article

AN - SCOPUS:85153389589

SN - 2379-8858

VL - 8

SP - 3781

EP - 3798

JO - IEEE Transactions on Intelligent Vehicles

JF - IEEE Transactions on Intelligent Vehicles

IS - 7

ER -

Multi-Modal 3D Object Detection in Autonomous Driving: A Survey and Taxonomy

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this