RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving

Xinyu Zhang; Li Wang; Guoxin Zhang; Tianwei Lan; Haoming Zhang; Lijun Zhao; Jun Li; Lei Zhu; Huaping Liu

doi:10.1109/TIM.2022.3224525

RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving

Xinyu Zhang, Li Wang^*, Guoxin Zhang, Tianwei Lan, Haoming Zhang, Lijun Zhao, Jun Li, Lei Zhu, Huaping Liu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

25 Citations (Scopus)

Abstract

The 3D object detection is becoming indispensable for environmental perception in autonomous driving. Light detection and ranging (LiDAR) point clouds often fail to distinguish objects with similar structures and are quite sparse for distant or small objects, thereby introducing false and missed detections. To address these issues, LiDAR is often fused with cameras due to the rich textural information provided by images. However, current fusion methods suffer the inefficient data representation and inaccurate alignment of heterogeneous features, leading to poor precision and low efficiency. To this end, we propose a plug-and-play module termed range-image fusion (RI-Fusion) to achieve an effective fusion of LiDAR and camera data, designed to be easily accessible by existing mainstream LiDAR-based algorithms. In this process, we design an image and point cloud alignment method by converting a point cloud into a compact range-view representation through a spherical coordinate transformation. The range image is then integrated with a corresponding camera image utilizing an attention mechanism. The original range image is then concatenated with fusion features to retain point cloud information, and the results are projected onto a spatial point cloud. Finally, the feature-enhanced point cloud can be input into a LiDAR-based 3D object detector. The results of validation experiments involving the KITTI 3D object detection benchmark showed that our proposed fusion method significantly enhanced multiple mainstream LiDAR-based 3D object detectors, PointPillars, SECOND, and Part A2, improving the 3D mAP (mean Average Precision) by 3.61%, 2.98%, and 1.27%, respectively, particularly for small objects such as pedestrians and cyclists.

Original language	English
Article number	5004213
Journal	IEEE Transactions on Instrumentation and Measurement
Volume	72
DOIs	https://doi.org/10.1109/TIM.2022.3224525
Publication status	Published - 2023
Externally published	Yes

Keywords

3D object detection
autonomous driving
feature fusion
multimodal
self-attention

Access to Document

10.1109/TIM.2022.3224525

Cite this

@article{a30ccc205f824d4e9c2419bd9076a9ab,

title = "RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving",

abstract = "The 3D object detection is becoming indispensable for environmental perception in autonomous driving. Light detection and ranging (LiDAR) point clouds often fail to distinguish objects with similar structures and are quite sparse for distant or small objects, thereby introducing false and missed detections. To address these issues, LiDAR is often fused with cameras due to the rich textural information provided by images. However, current fusion methods suffer the inefficient data representation and inaccurate alignment of heterogeneous features, leading to poor precision and low efficiency. To this end, we propose a plug-and-play module termed range-image fusion (RI-Fusion) to achieve an effective fusion of LiDAR and camera data, designed to be easily accessible by existing mainstream LiDAR-based algorithms. In this process, we design an image and point cloud alignment method by converting a point cloud into a compact range-view representation through a spherical coordinate transformation. The range image is then integrated with a corresponding camera image utilizing an attention mechanism. The original range image is then concatenated with fusion features to retain point cloud information, and the results are projected onto a spatial point cloud. Finally, the feature-enhanced point cloud can be input into a LiDAR-based 3D object detector. The results of validation experiments involving the KITTI 3D object detection benchmark showed that our proposed fusion method significantly enhanced multiple mainstream LiDAR-based 3D object detectors, PointPillars, SECOND, and Part A2, improving the 3D mAP (mean Average Precision) by 3.61%, 2.98%, and 1.27%, respectively, particularly for small objects such as pedestrians and cyclists.",

keywords = "3D object detection, autonomous driving, feature fusion, multimodal, self-attention",

author = "Xinyu Zhang and Li Wang and Guoxin Zhang and Tianwei Lan and Haoming Zhang and Lijun Zhao and Jun Li and Lei Zhu and Huaping Liu",

note = "Publisher Copyright: {\textcopyright} 1963-2012 IEEE.",

year = "2023",

doi = "10.1109/TIM.2022.3224525",

language = "English",

volume = "72",

journal = "IEEE Transactions on Instrumentation and Measurement",

issn = "0018-9456",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - RI-Fusion

T2 - 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving

AU - Zhang, Xinyu

AU - Wang, Li

AU - Zhang, Guoxin

AU - Lan, Tianwei

AU - Zhang, Haoming

AU - Zhao, Lijun

AU - Li, Jun

AU - Zhu, Lei

AU - Liu, Huaping

PY - 2023

Y1 - 2023

N2 - The 3D object detection is becoming indispensable for environmental perception in autonomous driving. Light detection and ranging (LiDAR) point clouds often fail to distinguish objects with similar structures and are quite sparse for distant or small objects, thereby introducing false and missed detections. To address these issues, LiDAR is often fused with cameras due to the rich textural information provided by images. However, current fusion methods suffer the inefficient data representation and inaccurate alignment of heterogeneous features, leading to poor precision and low efficiency. To this end, we propose a plug-and-play module termed range-image fusion (RI-Fusion) to achieve an effective fusion of LiDAR and camera data, designed to be easily accessible by existing mainstream LiDAR-based algorithms. In this process, we design an image and point cloud alignment method by converting a point cloud into a compact range-view representation through a spherical coordinate transformation. The range image is then integrated with a corresponding camera image utilizing an attention mechanism. The original range image is then concatenated with fusion features to retain point cloud information, and the results are projected onto a spatial point cloud. Finally, the feature-enhanced point cloud can be input into a LiDAR-based 3D object detector. The results of validation experiments involving the KITTI 3D object detection benchmark showed that our proposed fusion method significantly enhanced multiple mainstream LiDAR-based 3D object detectors, PointPillars, SECOND, and Part A2, improving the 3D mAP (mean Average Precision) by 3.61%, 2.98%, and 1.27%, respectively, particularly for small objects such as pedestrians and cyclists.

AB - The 3D object detection is becoming indispensable for environmental perception in autonomous driving. Light detection and ranging (LiDAR) point clouds often fail to distinguish objects with similar structures and are quite sparse for distant or small objects, thereby introducing false and missed detections. To address these issues, LiDAR is often fused with cameras due to the rich textural information provided by images. However, current fusion methods suffer the inefficient data representation and inaccurate alignment of heterogeneous features, leading to poor precision and low efficiency. To this end, we propose a plug-and-play module termed range-image fusion (RI-Fusion) to achieve an effective fusion of LiDAR and camera data, designed to be easily accessible by existing mainstream LiDAR-based algorithms. In this process, we design an image and point cloud alignment method by converting a point cloud into a compact range-view representation through a spherical coordinate transformation. The range image is then integrated with a corresponding camera image utilizing an attention mechanism. The original range image is then concatenated with fusion features to retain point cloud information, and the results are projected onto a spatial point cloud. Finally, the feature-enhanced point cloud can be input into a LiDAR-based 3D object detector. The results of validation experiments involving the KITTI 3D object detection benchmark showed that our proposed fusion method significantly enhanced multiple mainstream LiDAR-based 3D object detectors, PointPillars, SECOND, and Part A2, improving the 3D mAP (mean Average Precision) by 3.61%, 2.98%, and 1.27%, respectively, particularly for small objects such as pedestrians and cyclists.

KW - 3D object detection

KW - autonomous driving

KW - feature fusion

KW - multimodal

KW - self-attention

UR - http://www.scopus.com/inward/record.url?scp=85144079562&partnerID=8YFLogxK

U2 - 10.1109/TIM.2022.3224525

DO - 10.1109/TIM.2022.3224525

M3 - Article

AN - SCOPUS:85144079562

SN - 0018-9456

VL - 72

JO - IEEE Transactions on Instrumentation and Measurement

JF - IEEE Transactions on Instrumentation and Measurement

M1 - 5004213

ER -

RI-Fusion: 3D Object Detection Using Enhanced Point Features With Range-Image Fusion for Autonomous Driving

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this