TY - GEN
T1 - Point-Voxel Fusion with Adaptive Sectorized Points Sampling for 3D Object Detection
AU - Liu, Yihui
AU - He, Hongwen
AU - Tang, Yingjuan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - The continuous advancement and rapid iteration of autonomous driving technology have made LiDAR-based 3D object detection a critical area of research in both industry and academia. Currently, two widely used approaches are voxel-based and point-based two-stage detection frameworks, while more advanced methods effectively fuse voxel and point feature representations. However, existing voxel-point fusion methods still face challenges such as poor keypoint sampling performance, inadequate multi-scale feature fusion, and low computational efficiency. To address these issues, we propose a novel 3D object detection framework, adaptive sectorized points sampling network (ASPSnet), which adapts scene encoding for objects of varying scales and achieves efficient voxel-point feature aggregation, resulting in superior detection performance with reduced resource consumption. Experiments on the KITTI dataset show that ASPSnet achieves 3D mAP of 82.26%, 54.78% and 69.32% for the car, pedestrian and cyclist categories in moderate difficulty. Experiments on the Waymo Open Dataset show that ASPSnet achieves 3D mAPH of 70.20%, 77.21% and 73.75% for the vehicle, pedestrian and cyclist categories in LEVEL2 difficulty.
AB - The continuous advancement and rapid iteration of autonomous driving technology have made LiDAR-based 3D object detection a critical area of research in both industry and academia. Currently, two widely used approaches are voxel-based and point-based two-stage detection frameworks, while more advanced methods effectively fuse voxel and point feature representations. However, existing voxel-point fusion methods still face challenges such as poor keypoint sampling performance, inadequate multi-scale feature fusion, and low computational efficiency. To address these issues, we propose a novel 3D object detection framework, adaptive sectorized points sampling network (ASPSnet), which adapts scene encoding for objects of varying scales and achieves efficient voxel-point feature aggregation, resulting in superior detection performance with reduced resource consumption. Experiments on the KITTI dataset show that ASPSnet achieves 3D mAP of 82.26%, 54.78% and 69.32% for the car, pedestrian and cyclist categories in moderate difficulty. Experiments on the Waymo Open Dataset show that ASPSnet achieves 3D mAPH of 70.20%, 77.21% and 73.75% for the vehicle, pedestrian and cyclist categories in LEVEL2 difficulty.
KW - 3D object detection
KW - Autonomous driving perception
KW - Sparse convolution
UR - https://www.scopus.com/pages/publications/105028088976
U2 - 10.1007/978-981-95-4875-0_12
DO - 10.1007/978-981-95-4875-0_12
M3 - Conference contribution
AN - SCOPUS:105028088976
SN - 9789819548743
T3 - Communications in Computer and Information Science
SP - 147
EP - 159
BT - Intelligent Vehicles - 3rd CCF Intelligent Vehicles Symposium, CIVS 2025, Revised Selected Papers
A2 - Li, Huiyun
A2 - Wang, Zhongli
A2 - Zhao, Shuai
A2 - Sun, Peng
A2 - Herrmann, Michael
A2 - Zheng, Xi
A2 - Liu, Yuling
PB - Springer Science and Business Media Deutschland GmbH
T2 - 3rd CCF Intelligent Vehicles Symposium, CIVS 2025
Y2 - 16 August 2025 through 18 August 2025
ER -