CLFusion:3D Semantic Segmentation Based on Camera and Lidar Fusion

Tianyue Wang, Rujun Song, Zhuoling Xiao*, Bo Yan, Haojie Qin, Di He

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

In the field of autonomous driving, semantic segmentation is crucial for scene understanding. Currently, there are two main methods: camera-based and Lidar-based approaches. To address the issues of Lidar segmentation lacking texture features and image segmentation lacking distance information, this paper proposes a fusion of camera and Lidar to achieve 3D semantic segmentation. The method utilizes a dual-stream encoder-decoder network to process camera images and Lidar point cloud and incorporates a specially designed attention mechanism module for feature fusion. To avoid expensive manual annotation of 3D point clouds, the study also introduces a cross-dataset and cross-modal self-supervised training approach. Experimental results show a 2.4% improvement compared to the Lidar-only mode baseline results on the SemanticKITTI dataset and a 6% improvement on the nuScenes dataset.

Original languageEnglish
Title of host publicationISCAS 2024 - IEEE International Symposium on Circuits and Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350330991
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024 - Singapore, Singapore
Duration: 19 May 202422 May 2024

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)0271-4310

Conference

Conference2024 IEEE International Symposium on Circuits and Systems, ISCAS 2024
Country/TerritorySingapore
CitySingapore
Period19/05/2422/05/24

Keywords

  • 3D semantic segmentation
  • multi-modal fusion
  • self-supervised training

Fingerprint

Dive into the research topics of 'CLFusion:3D Semantic Segmentation Based on Camera and Lidar Fusion'. Together they form a unique fingerprint.

Cite this