TY - GEN
T1 - SE-PointFormer
T2 - 44th Chinese Control Conference, CCC 2025
AU - Yu, Zongni
AU - Jin, Hui
N1 - Publisher Copyright:
© 2025 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2025
Y1 - 2025
N2 - With the development of autonomous driving technology, the application of 3D object detection in complex dynamic environments has become increasingly important. However, image-based 3D object detection methods are difficult to complete accurate detection tasks due to the lack of depth information. Although LiDAR can provide more accurate 3D data, its high cost and data sparsity also limit its application scenarios. Therefore, this article proposes a 3D object detection network based on image semantics and enhanced point clouds, aiming to solve the problem of point cloud sparsity through the rich semantic information of images, enhance the expressive ability of point clouds, and thereby improve the accuracy of 3D object detection. This article first uses a two-dimensional instance segmentation model to segment images from multiple perspectives, and then assigns the extracted image semantic information to the corresponding point cloud in the semantic point cloud construction module. At the same time, a virtual point cloud with semantic information is constructed through Gaussian sampling. Subsequently, the enhanced point cloud is subjected to feature encoding and feature extraction, and a Transformer based decoder is used for preliminary prediction of the 3D target. Finally, a feature space sampling module was designed to efficiently fuse image semantic feature maps with enhanced point cloud features, further optimizing the object detection results. In the experiment, the nuScenes dataset was used for model validation, and the experimental results showed that the proposed method outperformed existing similar 3D object detection algorithms in multiple performance indicators. The mAP and NDS on the test set reached 68.6% and 72.3%, respectively, especially in object detection corresponding to sparse point clouds. Finally, the effectiveness of the semantic point cloud construction module and feature space sampling module in improving detection accuracy was verified through ablation experiments.
AB - With the development of autonomous driving technology, the application of 3D object detection in complex dynamic environments has become increasingly important. However, image-based 3D object detection methods are difficult to complete accurate detection tasks due to the lack of depth information. Although LiDAR can provide more accurate 3D data, its high cost and data sparsity also limit its application scenarios. Therefore, this article proposes a 3D object detection network based on image semantics and enhanced point clouds, aiming to solve the problem of point cloud sparsity through the rich semantic information of images, enhance the expressive ability of point clouds, and thereby improve the accuracy of 3D object detection. This article first uses a two-dimensional instance segmentation model to segment images from multiple perspectives, and then assigns the extracted image semantic information to the corresponding point cloud in the semantic point cloud construction module. At the same time, a virtual point cloud with semantic information is constructed through Gaussian sampling. Subsequently, the enhanced point cloud is subjected to feature encoding and feature extraction, and a Transformer based decoder is used for preliminary prediction of the 3D target. Finally, a feature space sampling module was designed to efficiently fuse image semantic feature maps with enhanced point cloud features, further optimizing the object detection results. In the experiment, the nuScenes dataset was used for model validation, and the experimental results showed that the proposed method outperformed existing similar 3D object detection algorithms in multiple performance indicators. The mAP and NDS on the test set reached 68.6% and 72.3%, respectively, especially in object detection corresponding to sparse point clouds. Finally, the effectiveness of the semantic point cloud construction module and feature space sampling module in improving detection accuracy was verified through ablation experiments.
KW - 3D Object Detection
KW - Point Cloud Enhancement
KW - Transformer
UR - https://www.scopus.com/pages/publications/105020302913
U2 - 10.23919/CCC64809.2025.11179516
DO - 10.23919/CCC64809.2025.11179516
M3 - Conference contribution
AN - SCOPUS:105020302913
T3 - Chinese Control Conference, CCC
SP - 7707
EP - 7714
BT - Proceedings of the 44th Chinese Control Conference, CCC 2025
A2 - Sun, Jian
A2 - Yin, Hongpeng
PB - IEEE Computer Society
Y2 - 28 July 2025 through 30 July 2025
ER -