摘要
The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.
源语言 | 英语 |
---|---|
页(从-至) | 3066-3076 |
页数 | 11 |
期刊 | IEEE Transactions on Computational Social Systems |
卷 | 10 |
期 | 6 |
DOI | |
出版状态 | 已出版 - 1 12月 2023 |