HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

Chen Feng; Chao Xiang; Xiaopo Xie; Yuan Zhang; Mingchuan Yang; Xuesong Li

doi:10.1109/TCSS.2023.3286543

HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

Chen Feng, Chao Xiang, Xiaopo Xie, Yuan Zhang, Mingchuan Yang, Xuesong Li^*

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.

Original language	English
Pages (from-to)	3066-3076
Number of pages	11
Journal	IEEE Transactions on Computational Social Systems
Volume	10
Issue number	6
DOIs	https://doi.org/10.1109/TCSS.2023.3286543
Publication status	Published - 1 Dec 2023

Keywords

3-D object detection
autonomous driving
feature fusion
point clouds

Access to Document

10.1109/TCSS.2023.3286543

Cite this

@article{77bffaa0ee1f48539f7f52d3652caf52,

title = "HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection",

abstract = "The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.",

keywords = "3-D object detection, autonomous driving, feature fusion, point clouds",

author = "Chen Feng and Chao Xiang and Xiaopo Xie and Yuan Zhang and Mingchuan Yang and Xuesong Li",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2023",

month = dec,

day = "1",

doi = "10.1109/TCSS.2023.3286543",

language = "English",

volume = "10",

pages = "3066--3076",

journal = "IEEE Transactions on Computational Social Systems",

issn = "2329-924X",

publisher = "IEEE Systems, Man, and Cybernetics Society",

number = "6",

}

TY - JOUR

T1 - HPV-RCNN

T2 - Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

AU - Feng, Chen

AU - Xiang, Chao

AU - Xie, Xiaopo

AU - Zhang, Yuan

AU - Yang, Mingchuan

AU - Li, Xuesong

PY - 2023/12/1

Y1 - 2023/12/1

N2 - The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.

AB - The current two-stage detectors remarkably benefit from hybrid representation of points and 3-D voxels, but they have high time cost and leave room for improving the accuracy of small objects. On the contrary, 2-D voxel-based methods tend to have good efficiency and better performance for small objects. An intuitive idea of optimizing a two-stage algorithm is to use a 2-D voxel-based backbone. However, naive representation substitution cannot achieve optimal joint learning of each representation and may cause a decrease in accuracy. In this article, we propose hybrid point-voxel RCNN (HPV-RCNN), a novel point cloud detection network which combines the merits of points and 2-D voxels. First, we propose a multiattentive voxel feature encoding module (MAVFE) to exploit multilevel attention of multiscale voxels. We also present a partial fusion pyramid network (PFPN) to effectively integrate multiresolution features and generate high-quality proposals. Then, a multiscale region of interest (RoI)-grid pooling (MSRGP) module is proposed to adaptively abstract proposal-specific features from sampled keypoints in multiple receptive fields. In addition, a cascade attentive module (CAM) is adopted to achieve incrementally proposal refinement by subsequent multiple subnetworks. Our method reaches top performance among two-stage methods in Cyclist and Pedestrian categories on the KITTI dataset while achieving real-time inference speed. Extensive experiments on challenging roadside DAIR-V2X-I dataset also demonstrate that our method achieves superior detection performance.

KW - 3-D object detection

KW - autonomous driving

KW - feature fusion

KW - point clouds

UR - http://www.scopus.com/inward/record.url?scp=85164737233&partnerID=8YFLogxK

U2 - 10.1109/TCSS.2023.3286543

DO - 10.1109/TCSS.2023.3286543

M3 - Article

AN - SCOPUS:85164737233

SN - 2329-924X

VL - 10

SP - 3066

EP - 3076

JO - IEEE Transactions on Computational Social Systems

JF - IEEE Transactions on Computational Social Systems

IS - 6

ER -

HPV-RCNN: Hybrid Point-Voxel Two-Stage Network for LiDAR-Based 3-D Object Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this