HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

Chen Feng; Chao Xiang; Xiaopo Xie; Yuan Zhang; Mingchuan Yang; Xuesong Li

doi:10.1109/TCSS.2023.3286543

HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

Chen Feng, Chao Xiang, Xiaopo Xie, Yuan Zhang, Mingchuan Yang, Xuesong Li^*

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

7 引用（Scopus）

摘要

The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.

源语言	英语
页（从-至）	3066-3076
页数	11
期刊	IEEE Transactions on Computational Social Systems
卷	10
期	6
DOI	https://doi.org/10.1109/TCSS.2023.3286543
出版状态	已出版 - 1 12月 2023

访问文件

10.1109/TCSS.2023.3286543

其它文件与链接

链接到 Scopus 的出版物

引用此

Feng, C., Xiang, C., Xie, X., Zhang, Y., Yang, M., & Li, X. (2023). HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection. IEEE Transactions on Computational Social Systems, 10(6), 3066-3076. https://doi.org/10.1109/TCSS.2023.3286543

@article{77bffaa0ee1f48539f7f52d3652caf52,

title = "HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection",

abstract = "The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.",

keywords = "3-D object detection, autonomous driving, feature fusion, point clouds",

author = "Chen Feng and Chao Xiang and Xiaopo Xie and Yuan Zhang and Mingchuan Yang and Xuesong Li",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/TCSS.2023.3286543",

language = "English",

volume = "10",

pages = "3066--3076",

journal = "IEEE Transactions on Computational Social Systems",

issn = "2329-924X",

publisher = "IEEE Systems, Man, and Cybernetics Society",

number = "6",

}

TY - JOUR

T1 - HPV-RCNN

T2 - Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

AU - Feng, Chen

AU - Xiang, Chao

AU - Xie, Xiaopo

AU - Zhang, Yuan

AU - Yang, Mingchuan

AU - Li, Xuesong

PY - 2023/12/1

Y1 - 2023/12/1

N2 - The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.

AB - The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.

KW - 3-D object detection

KW - autonomous driving

KW - feature fusion

KW - point clouds

UR - http://www.scopus.com/inward/record.url?scp=85164737233&partnerID=8YFLogxK

U2 - 10.1109/TCSS.2023.3286543

DO - 10.1109/TCSS.2023.3286543

M3 - Article

AN - SCOPUS:85164737233

SN - 2329-924X

VL - 10

SP - 3066

EP - 3076

JO - IEEE Transactions on Computational Social Systems

JF - IEEE Transactions on Computational Social Systems

IS - 6

ER -

HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

摘要

访问文件

其它文件与链接

指纹

引用此