FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection

Xinli Xu; Shaocong Dong; Tingfa Xu; Lihe Ding; Jie Wang; Peng Jiang; Liqiang Song; Jianan Li

doi:10.3390/rs15071839

FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection

Xinli Xu, Shaocong Dong, Tingfa Xu, Lihe Ding, Jie Wang, Peng Jiang, Liqiang Song, Jianan Li^*

^*Corresponding author for this work

School of Optics and Photonics

Research output: Contribution to journal › Article › peer-review

20 Citations (Scopus)

Abstract

Accurate and reliable perception systems are essential for autonomous driving and robotics. To achieve this, 3D object detection with multi-sensors is necessary. Existing 3D detectors have significantly improved accuracy by adopting a two-stage paradigm that relies solely on LiDAR point clouds for 3D proposal refinement. However, the sparsity of point clouds, particularly for faraway points, makes it difficult for the LiDAR-only refinement module to recognize and locate objects accurately. To address this issue, we propose a novel multi-modality two-stage approach called FusionRCNN. This approach effectively and efficiently fuses point clouds and camera images in the Regions of Interest (RoI). The FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from the camera in a unified attention mechanism. Specifically, FusionRCNN first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step. Then, it leverages an intra-modality self-attention to enhance the domain-specific features, followed by a well-designed cross-attention to fuse the information from two modalities. FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors. Remarkably, FusionRCNN improves the strong SECOND baseline by 6.14% mAP on Waymo and outperforms competing two-stage approaches.

Original language	English
Article number	1839
Journal	Remote Sensing
Volume	15
Issue number	7
DOIs	https://doi.org/10.3390/rs15071839
Publication status	Published - Apr 2023

Keywords

3D object detection
LiDAR-camera fusion
two-stage

Access to Document

10.3390/rs15071839

Cite this

@article{ad12e735f3134357950c9484132f3e81,

title = "FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection",

abstract = "Accurate and reliable perception systems are essential for autonomous driving and robotics. To achieve this, 3D object detection with multi-sensors is necessary. Existing 3D detectors have significantly improved accuracy by adopting a two-stage paradigm that relies solely on LiDAR point clouds for 3D proposal refinement. However, the sparsity of point clouds, particularly for faraway points, makes it difficult for the LiDAR-only refinement module to recognize and locate objects accurately. To address this issue, we propose a novel multi-modality two-stage approach called FusionRCNN. This approach effectively and efficiently fuses point clouds and camera images in the Regions of Interest (RoI). The FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from the camera in a unified attention mechanism. Specifically, FusionRCNN first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step. Then, it leverages an intra-modality self-attention to enhance the domain-specific features, followed by a well-designed cross-attention to fuse the information from two modalities. FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors. Remarkably, FusionRCNN improves the strong SECOND baseline by 6.14% mAP on Waymo and outperforms competing two-stage approaches.",

keywords = "3D object detection, LiDAR-camera fusion, two-stage",

author = "Xinli Xu and Shaocong Dong and Tingfa Xu and Lihe Ding and Jie Wang and Peng Jiang and Liqiang Song and Jianan Li",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = apr,

doi = "10.3390/rs15071839",

language = "English",

volume = "15",

journal = "Remote Sensing",

issn = "2072-4292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "7",

}

TY - JOUR

T1 - FusionRCNN

T2 - LiDAR-Camera Fusion for Two-Stage 3D Object Detection

AU - Xu, Xinli

AU - Dong, Shaocong

AU - Xu, Tingfa

AU - Ding, Lihe

AU - Wang, Jie

AU - Jiang, Peng

AU - Song, Liqiang

AU - Li, Jianan

PY - 2023/4

Y1 - 2023/4

N2 - Accurate and reliable perception systems are essential for autonomous driving and robotics. To achieve this, 3D object detection with multi-sensors is necessary. Existing 3D detectors have significantly improved accuracy by adopting a two-stage paradigm that relies solely on LiDAR point clouds for 3D proposal refinement. However, the sparsity of point clouds, particularly for faraway points, makes it difficult for the LiDAR-only refinement module to recognize and locate objects accurately. To address this issue, we propose a novel multi-modality two-stage approach called FusionRCNN. This approach effectively and efficiently fuses point clouds and camera images in the Regions of Interest (RoI). The FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from the camera in a unified attention mechanism. Specifically, FusionRCNN first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step. Then, it leverages an intra-modality self-attention to enhance the domain-specific features, followed by a well-designed cross-attention to fuse the information from two modalities. FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors. Remarkably, FusionRCNN improves the strong SECOND baseline by 6.14% mAP on Waymo and outperforms competing two-stage approaches.

AB - Accurate and reliable perception systems are essential for autonomous driving and robotics. To achieve this, 3D object detection with multi-sensors is necessary. Existing 3D detectors have significantly improved accuracy by adopting a two-stage paradigm that relies solely on LiDAR point clouds for 3D proposal refinement. However, the sparsity of point clouds, particularly for faraway points, makes it difficult for the LiDAR-only refinement module to recognize and locate objects accurately. To address this issue, we propose a novel multi-modality two-stage approach called FusionRCNN. This approach effectively and efficiently fuses point clouds and camera images in the Regions of Interest (RoI). The FusionRCNN adaptively integrates both sparse geometry information from LiDAR and dense texture information from the camera in a unified attention mechanism. Specifically, FusionRCNN first utilizes RoIPooling to obtain an image set with a unified size and gets the point set by sampling raw points within proposals in the RoI extraction step. Then, it leverages an intra-modality self-attention to enhance the domain-specific features, followed by a well-designed cross-attention to fuse the information from two modalities. FusionRCNN is fundamentally plug-and-play and supports different one-stage methods with almost no architectural changes. Extensive experiments on KITTI and Waymo benchmarks demonstrate that our method significantly boosts the performances of popular detectors. Remarkably, FusionRCNN improves the strong SECOND baseline by 6.14% mAP on Waymo and outperforms competing two-stage approaches.

KW - 3D object detection

KW - LiDAR-camera fusion

KW - two-stage

UR - http://www.scopus.com/inward/record.url?scp=85152633951&partnerID=8YFLogxK

U2 - 10.3390/rs15071839

DO - 10.3390/rs15071839

M3 - Article

AN - SCOPUS:85152633951

SN - 2072-4292

VL - 15

JO - Remote Sensing

JF - Remote Sensing

IS - 7

M1 - 1839

ER -

FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this