TY - GEN
T1 - Distance Awared:Adaptive Voxel Resolution to help 3D Object Detection Networks See Farther
AU - Liao, Zhiyu
AU - Jin, Ying
AU - Ma, Hongbin
AU - Alsumeri, Abdulrahman
N1 - Publisher Copyright:
© 2023 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2023
Y1 - 2023
N2 - Point cloud is the most widely used data input for modern 3D object detection methods, however, due to the complexity of the environment in which data is collected, it suffers from inevitable loss of information, which is extremely serious for distant objects. In this paper, we improved the backbone of voxel-based 3D object detection methods, which help to better detect distant targets. The improvements we proposed can help to process raw points in different resolutions according to its distance to lidar. Specifically, the farther point is away from lidar, the finer feature extraction and aggression conduction we will adopt. Our purpose is to find a balance between information loss and memory consumption. Moreover, inspired by the success of Transformer in computer vision, we adopted Multi-headed Self-attention(MHSA) structure to our modified backbone. MHSA offers the ability of global receptive field, which helps to get more informative Bird's Eye View(BEV) representation of the point cloud. Our modifications are plug and play, and can be used in any 3D object detection method based on voxels and sparse 3D convolution. We evaluated the performance of our modifications in KITTI, experiments demonstrate the effectiveness of our efforts.
AB - Point cloud is the most widely used data input for modern 3D object detection methods, however, due to the complexity of the environment in which data is collected, it suffers from inevitable loss of information, which is extremely serious for distant objects. In this paper, we improved the backbone of voxel-based 3D object detection methods, which help to better detect distant targets. The improvements we proposed can help to process raw points in different resolutions according to its distance to lidar. Specifically, the farther point is away from lidar, the finer feature extraction and aggression conduction we will adopt. Our purpose is to find a balance between information loss and memory consumption. Moreover, inspired by the success of Transformer in computer vision, we adopted Multi-headed Self-attention(MHSA) structure to our modified backbone. MHSA offers the ability of global receptive field, which helps to get more informative Bird's Eye View(BEV) representation of the point cloud. Our modifications are plug and play, and can be used in any 3D object detection method based on voxels and sparse 3D convolution. We evaluated the performance of our modifications in KITTI, experiments demonstrate the effectiveness of our efforts.
KW - 3D object detection
KW - Multi-headed Self-attention
KW - distant targets
KW - plug and play
KW - voxel-based
UR - http://www.scopus.com/inward/record.url?scp=85175547668&partnerID=8YFLogxK
U2 - 10.23919/CCC58697.2023.10240474
DO - 10.23919/CCC58697.2023.10240474
M3 - Conference contribution
AN - SCOPUS:85175547668
T3 - Chinese Control Conference, CCC
SP - 7995
EP - 8000
BT - 2023 42nd Chinese Control Conference, CCC 2023
PB - IEEE Computer Society
T2 - 42nd Chinese Control Conference, CCC 2023
Y2 - 24 July 2023 through 26 July 2023
ER -