SE-PointFormer: An Efficient 3D Object Detection Network Based on Image Semantics and Enhanced Point Clouds

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the development of autonomous driving technology, the application of 3D object detection in complex dynamic environments has become increasingly important. However, image-based 3D object detection methods are difficult to complete accurate detection tasks due to the lack of depth information. Although LiDAR can provide more accurate 3D data, its high cost and data sparsity also limit its application scenarios. Therefore, this article proposes a 3D object detection network based on image semantics and enhanced point clouds, aiming to solve the problem of point cloud sparsity through the rich semantic information of images, enhance the expressive ability of point clouds, and thereby improve the accuracy of 3D object detection. This article first uses a two-dimensional instance segmentation model to segment images from multiple perspectives, and then assigns the extracted image semantic information to the corresponding point cloud in the semantic point cloud construction module. At the same time, a virtual point cloud with semantic information is constructed through Gaussian sampling. Subsequently, the enhanced point cloud is subjected to feature encoding and feature extraction, and a Transformer based decoder is used for preliminary prediction of the 3D target. Finally, a feature space sampling module was designed to efficiently fuse image semantic feature maps with enhanced point cloud features, further optimizing the object detection results. In the experiment, the nuScenes dataset was used for model validation, and the experimental results showed that the proposed method outperformed existing similar 3D object detection algorithms in multiple performance indicators. The mAP and NDS on the test set reached 68.6% and 72.3%, respectively, especially in object detection corresponding to sparse point clouds. Finally, the effectiveness of the semantic point cloud construction module and feature space sampling module in improving detection accuracy was verified through ablation experiments.

Original languageEnglish
Title of host publicationProceedings of the 44th Chinese Control Conference, CCC 2025
EditorsJian Sun, Hongpeng Yin
PublisherIEEE Computer Society
Pages7707-7714
Number of pages8
ISBN (Electronic)9789887581611
DOIs
Publication statusPublished - 2025
Event44th Chinese Control Conference, CCC 2025 - Chongqing, China
Duration: 28 Jul 202530 Jul 2025

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference44th Chinese Control Conference, CCC 2025
Country/TerritoryChina
CityChongqing
Period28/07/2530/07/25

Keywords

  • 3D Object Detection
  • Point Cloud Enhancement
  • Transformer

Fingerprint

Dive into the research topics of 'SE-PointFormer: An Efficient 3D Object Detection Network Based on Image Semantics and Enhanced Point Clouds'. Together they form a unique fingerprint.

Cite this