Sem-Aug: Improving Camera-LiDAR Feature Fusion With Semantic Augmentation for 3D Vehicle Detection

Lin Zhao, Meiling Wang, Yufeng Yue*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

Camera-LiDAR fusion provides precise distance measurements and fine-grained textures, making it a promising option for 3D vehicle detection in autonomous driving scenarios. Previous camera-LiDAR based 3D vehicle detection approaches mainly focused on employing image-based pre-trained models to fetch semantic features. However, these methods may perform inferior to the LiDAR-based ones when lacking semantic segmentation labels in autonomous driving tasks. Motivated by this observation, we propose a novel semantic augmentation method, namely Sem-Aug, to guide high-confidence camera-LiDAR fusion feature generation and boost the performance of multimodal 3D vehicle detection. The key novelty of semantic augmentation lies in the 2D segmentation mask auto-labeling, which provides supervision for semantic segmentation sub-network to mitigate the poor generalization performance of camera-LiDAR fusion. Using semantic-augmentation-guided camera-LiDAR fusion features, Sem-Aug achieves remarkable performance on the representative autonomous driving KITTI dataset compared to both the LiDAR-based baseline and previous multimodal 3D vehicle detectors. Qualitative and quantitative experiments demonstrate that Sem-Aug provides significant improvements in challenging Hard detection scenarios caused by occlusion and truncation.

Original languageEnglish
Pages (from-to)9358-9365
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume7
Issue number4
DOIs
Publication statusPublished - 1 Oct 2022

Keywords

  • Computer vision for transportation
  • intelligent transportation systems
  • object detection
  • segmentation and categorization

Fingerprint

Dive into the research topics of 'Sem-Aug: Improving Camera-LiDAR Feature Fusion With Semantic Augmentation for 3D Vehicle Detection'. Together they form a unique fingerprint.

Cite this