基 于 点 云 数 据 的 三 维 目 标 检 测 技 术 研 究 进 展

Jianan Li; Ze Wang; Tingfa Xu

doi:10.3788/AOS230745

基于点云数据的三维目标检测技术研究进展

Translated title of the contribution: Three-Dimensional Object Detection Technology Based on Point Cloud Data

Jianan Li, Ze Wang, Tingfa Xu^*

^*Corresponding author for this work

School of Optics and Photonics

Beijing Institute of Technology

Research output: Contribution to journal › Review article › peer-review

9 Citations (Scopus)

Abstract

Significance In recent years, self-driving technology has garnered considerable attention from both academia and industry. Autonomous perception, which encompasses the perception of the vehicle's state and the surrounding environment, is a critical component of self-driving technology, guiding decision-making and planning modules. In order to perceive the environment accurately, it is necessary to detect objects in three-dimensional (3D) scenes. However, traditional 3D object detection techniques are typically based on image data, which lack depth information. This makes it challenging to use image-based object detection in 3D scene tasks. Therefore, 3D object detection predominantly relies on point cloud data obtained from devices such as lidar and 3D scanners. Point cloud data consist of a collection of points, with each containing coordinate information and additional attributes such as color, normal vector, and intensity. Point cloud data are rich in depth information. However, in contrast to two-dimensional images, point cloud data are sparse and unordered, and they exhibit a complex and irregular structure, posing challenges for feature extraction processes. Traditional methods rely on local point cloud information such as curvature, normal vector, and density, combined with methods such as the Gaussian model to manually design descriptors for processing point cloud data. However, these methods rely heavily on a priori knowledge and fail to account for the relationships between neighboring points, resulting in low robustness and susceptibility to noise. In recent years, deep learning methods have gained significant attention from researchers due to their robust feature representation and generalization capabilities. The effectiveness of deep learning methods relies heavily on high-quality datasets. To advance the field of point cloud object detection, numerous companies such as Waymo and Baidu, as well as research institutes have produced large-scale point cloud datasets. With the help of such datasets, point cloud object detection combined with deep learning has rapidly developed and demonstrated powerful performance. Despite the progress made in this field, challenges related to accuracy and real-time performance still exist. Therefore, this paper provides a review of the research conducted in point cloud object detection and looks forward to future developments to promote the advancement of this field. Progress The development of point cloud object detection has been significantly promoted by the recent emergence of large-scale open-source datasets. Several standard datasets for outdoor scenes, including KITTI, Waymo, and nuScenes, as well as indoor scenes, including NYU-Depth, SUN RGB-D, and ScanNet, have been released, which have greatly facilitated research in this field. The relevant properties of these datasets are summarized in Table 1. Point cloud data are characterized by sparsity, non-uniformity, and disorder, which distinguish them from image data. To address these unique properties of point clouds, researchers have developed a range of object detection algorithms specifically designed for this type of data. Based on the methods of feature extraction, point cloud-based single-modal methods can be categorized into four groups: voxel-based, point-based, graph-based, and point+voxel-based methods. Voxel-based methods divide the point cloud into regular voxel grids and aggregate point cloud features within each voxel to generate regular four-dimensional feature maps. VoxelNet, SECOND, and PointPillars are classic architectures of this kind of method. Point-based methods process the point cloud directly and utilize symmetric functions to aggregate point cloud features while retaining the geometric information of the point cloud to the greatest extent. PointNet, PointNet++, and Point R-CNN are their classic architectures. Graph-based methods convert the point cloud into a graph representation and process it through the graph neural network. Point GNN and Graph R-CNN are classic architectures of this approach. Point+voxel-based methods combine the methods based on point and those based on voxel, with STD and PV R-CNN as classic architectures. In addition, to enhance the semantic information of point cloud data, researchers have used image data to supplement secondary information to design multi-modal methods. MV3D, AVOD, and MMF are classic architectures of multi-modal methods. A chronological summary of classical methods for object detection from point clouds is presented in Fig. 4. Conclusions and Prospects The field of 3D object detection from point clouds is a significant research area in computer vision that is gaining increasing attention from scholars. The foundational branch of 3D object detection from point clouds has flourished, and future research may focus on several areas. These include multi-branch and multi-mode fusion, the integration of two-dimensional detection methods, weakly supervised and self-supervised learning, and the creation and utilization of complex datasets.

Translated title of the contribution	Three-Dimensional Object Detection Technology Based on Point Cloud Data
Original language	Chinese (Traditional)
Article number	1515001
Journal	Guangxue Xuebao/Acta Optica Sinica
Volume	43
Issue number	15
DOIs	https://doi.org/10.3788/AOS230745
Publication status	Published - Aug 2023

Access to Document

10.3788/AOS230745

Cite this

@article{02f0cd89cdf3498d99d15290bcddf7d9,

title = "基于点云数据的三维目标检测技术研究进展",

abstract = "Significance In recent years, self-driving technology has garnered considerable attention from both academia and industry. Autonomous perception, which encompasses the perception of the vehicle's state and the surrounding environment, is a critical component of self-driving technology, guiding decision-making and planning modules. In order to perceive the environment accurately, it is necessary to detect objects in three-dimensional (3D) scenes. However, traditional 3D object detection techniques are typically based on image data, which lack depth information. This makes it challenging to use image-based object detection in 3D scene tasks. Therefore, 3D object detection predominantly relies on point cloud data obtained from devices such as lidar and 3D scanners. Point cloud data consist of a collection of points, with each containing coordinate information and additional attributes such as color, normal vector, and intensity. Point cloud data are rich in depth information. However, in contrast to two-dimensional images, point cloud data are sparse and unordered, and they exhibit a complex and irregular structure, posing challenges for feature extraction processes. Traditional methods rely on local point cloud information such as curvature, normal vector, and density, combined with methods such as the Gaussian model to manually design descriptors for processing point cloud data. However, these methods rely heavily on a priori knowledge and fail to account for the relationships between neighboring points, resulting in low robustness and susceptibility to noise. In recent years, deep learning methods have gained significant attention from researchers due to their robust feature representation and generalization capabilities. The effectiveness of deep learning methods relies heavily on high-quality datasets. To advance the field of point cloud object detection, numerous companies such as Waymo and Baidu, as well as research institutes have produced large-scale point cloud datasets. With the help of such datasets, point cloud object detection combined with deep learning has rapidly developed and demonstrated powerful performance. Despite the progress made in this field, challenges related to accuracy and real-time performance still exist. Therefore, this paper provides a review of the research conducted in point cloud object detection and looks forward to future developments to promote the advancement of this field. Progress The development of point cloud object detection has been significantly promoted by the recent emergence of large-scale open-source datasets. Several standard datasets for outdoor scenes, including KITTI, Waymo, and nuScenes, as well as indoor scenes, including NYU-Depth, SUN RGB-D, and ScanNet, have been released, which have greatly facilitated research in this field. The relevant properties of these datasets are summarized in Table 1. Point cloud data are characterized by sparsity, non-uniformity, and disorder, which distinguish them from image data. To address these unique properties of point clouds, researchers have developed a range of object detection algorithms specifically designed for this type of data. Based on the methods of feature extraction, point cloud-based single-modal methods can be categorized into four groups: voxel-based, point-based, graph-based, and point+voxel-based methods. Voxel-based methods divide the point cloud into regular voxel grids and aggregate point cloud features within each voxel to generate regular four-dimensional feature maps. VoxelNet, SECOND, and PointPillars are classic architectures of this kind of method. Point-based methods process the point cloud directly and utilize symmetric functions to aggregate point cloud features while retaining the geometric information of the point cloud to the greatest extent. PointNet, PointNet++, and Point R-CNN are their classic architectures. Graph-based methods convert the point cloud into a graph representation and process it through the graph neural network. Point GNN and Graph R-CNN are classic architectures of this approach. Point+voxel-based methods combine the methods based on point and those based on voxel, with STD and PV R-CNN as classic architectures. In addition, to enhance the semantic information of point cloud data, researchers have used image data to supplement secondary information to design multi-modal methods. MV3D, AVOD, and MMF are classic architectures of multi-modal methods. A chronological summary of classical methods for object detection from point clouds is presented in Fig. 4. Conclusions and Prospects The field of 3D object detection from point clouds is a significant research area in computer vision that is gaining increasing attention from scholars. The foundational branch of 3D object detection from point clouds has flourished, and future research may focus on several areas. These include multi-branch and multi-mode fusion, the integration of two-dimensional detection methods, weakly supervised and self-supervised learning, and the creation and utilization of complex datasets.",

keywords = "3D object detection, multi-modality, point cloud, single modality",

author = "Jianan Li and Ze Wang and Tingfa Xu",

year = "2023",

month = aug,

doi = "10.3788/AOS230745",

language = "繁体中文",

volume = "43",

journal = "Guangxue Xuebao/Acta Optica Sinica",

issn = "0253-2239",

publisher = "Chinese Optical Society",

number = "15",

}

TY - JOUR

T1 - 基于点云数据的三维目标检测技术研究进展

AU - Li, Jianan

AU - Wang, Ze

AU - Xu, Tingfa

PY - 2023/8

Y1 - 2023/8

N2 - Significance In recent years, self-driving technology has garnered considerable attention from both academia and industry. Autonomous perception, which encompasses the perception of the vehicle's state and the surrounding environment, is a critical component of self-driving technology, guiding decision-making and planning modules. In order to perceive the environment accurately, it is necessary to detect objects in three-dimensional (3D) scenes. However, traditional 3D object detection techniques are typically based on image data, which lack depth information. This makes it challenging to use image-based object detection in 3D scene tasks. Therefore, 3D object detection predominantly relies on point cloud data obtained from devices such as lidar and 3D scanners. Point cloud data consist of a collection of points, with each containing coordinate information and additional attributes such as color, normal vector, and intensity. Point cloud data are rich in depth information. However, in contrast to two-dimensional images, point cloud data are sparse and unordered, and they exhibit a complex and irregular structure, posing challenges for feature extraction processes. Traditional methods rely on local point cloud information such as curvature, normal vector, and density, combined with methods such as the Gaussian model to manually design descriptors for processing point cloud data. However, these methods rely heavily on a priori knowledge and fail to account for the relationships between neighboring points, resulting in low robustness and susceptibility to noise. In recent years, deep learning methods have gained significant attention from researchers due to their robust feature representation and generalization capabilities. The effectiveness of deep learning methods relies heavily on high-quality datasets. To advance the field of point cloud object detection, numerous companies such as Waymo and Baidu, as well as research institutes have produced large-scale point cloud datasets. With the help of such datasets, point cloud object detection combined with deep learning has rapidly developed and demonstrated powerful performance. Despite the progress made in this field, challenges related to accuracy and real-time performance still exist. Therefore, this paper provides a review of the research conducted in point cloud object detection and looks forward to future developments to promote the advancement of this field. Progress The development of point cloud object detection has been significantly promoted by the recent emergence of large-scale open-source datasets. Several standard datasets for outdoor scenes, including KITTI, Waymo, and nuScenes, as well as indoor scenes, including NYU-Depth, SUN RGB-D, and ScanNet, have been released, which have greatly facilitated research in this field. The relevant properties of these datasets are summarized in Table 1. Point cloud data are characterized by sparsity, non-uniformity, and disorder, which distinguish them from image data. To address these unique properties of point clouds, researchers have developed a range of object detection algorithms specifically designed for this type of data. Based on the methods of feature extraction, point cloud-based single-modal methods can be categorized into four groups: voxel-based, point-based, graph-based, and point+voxel-based methods. Voxel-based methods divide the point cloud into regular voxel grids and aggregate point cloud features within each voxel to generate regular four-dimensional feature maps. VoxelNet, SECOND, and PointPillars are classic architectures of this kind of method. Point-based methods process the point cloud directly and utilize symmetric functions to aggregate point cloud features while retaining the geometric information of the point cloud to the greatest extent. PointNet, PointNet++, and Point R-CNN are their classic architectures. Graph-based methods convert the point cloud into a graph representation and process it through the graph neural network. Point GNN and Graph R-CNN are classic architectures of this approach. Point+voxel-based methods combine the methods based on point and those based on voxel, with STD and PV R-CNN as classic architectures. In addition, to enhance the semantic information of point cloud data, researchers have used image data to supplement secondary information to design multi-modal methods. MV3D, AVOD, and MMF are classic architectures of multi-modal methods. A chronological summary of classical methods for object detection from point clouds is presented in Fig. 4. Conclusions and Prospects The field of 3D object detection from point clouds is a significant research area in computer vision that is gaining increasing attention from scholars. The foundational branch of 3D object detection from point clouds has flourished, and future research may focus on several areas. These include multi-branch and multi-mode fusion, the integration of two-dimensional detection methods, weakly supervised and self-supervised learning, and the creation and utilization of complex datasets.

AB - Significance In recent years, self-driving technology has garnered considerable attention from both academia and industry. Autonomous perception, which encompasses the perception of the vehicle's state and the surrounding environment, is a critical component of self-driving technology, guiding decision-making and planning modules. In order to perceive the environment accurately, it is necessary to detect objects in three-dimensional (3D) scenes. However, traditional 3D object detection techniques are typically based on image data, which lack depth information. This makes it challenging to use image-based object detection in 3D scene tasks. Therefore, 3D object detection predominantly relies on point cloud data obtained from devices such as lidar and 3D scanners. Point cloud data consist of a collection of points, with each containing coordinate information and additional attributes such as color, normal vector, and intensity. Point cloud data are rich in depth information. However, in contrast to two-dimensional images, point cloud data are sparse and unordered, and they exhibit a complex and irregular structure, posing challenges for feature extraction processes. Traditional methods rely on local point cloud information such as curvature, normal vector, and density, combined with methods such as the Gaussian model to manually design descriptors for processing point cloud data. However, these methods rely heavily on a priori knowledge and fail to account for the relationships between neighboring points, resulting in low robustness and susceptibility to noise. In recent years, deep learning methods have gained significant attention from researchers due to their robust feature representation and generalization capabilities. The effectiveness of deep learning methods relies heavily on high-quality datasets. To advance the field of point cloud object detection, numerous companies such as Waymo and Baidu, as well as research institutes have produced large-scale point cloud datasets. With the help of such datasets, point cloud object detection combined with deep learning has rapidly developed and demonstrated powerful performance. Despite the progress made in this field, challenges related to accuracy and real-time performance still exist. Therefore, this paper provides a review of the research conducted in point cloud object detection and looks forward to future developments to promote the advancement of this field. Progress The development of point cloud object detection has been significantly promoted by the recent emergence of large-scale open-source datasets. Several standard datasets for outdoor scenes, including KITTI, Waymo, and nuScenes, as well as indoor scenes, including NYU-Depth, SUN RGB-D, and ScanNet, have been released, which have greatly facilitated research in this field. The relevant properties of these datasets are summarized in Table 1. Point cloud data are characterized by sparsity, non-uniformity, and disorder, which distinguish them from image data. To address these unique properties of point clouds, researchers have developed a range of object detection algorithms specifically designed for this type of data. Based on the methods of feature extraction, point cloud-based single-modal methods can be categorized into four groups: voxel-based, point-based, graph-based, and point+voxel-based methods. Voxel-based methods divide the point cloud into regular voxel grids and aggregate point cloud features within each voxel to generate regular four-dimensional feature maps. VoxelNet, SECOND, and PointPillars are classic architectures of this kind of method. Point-based methods process the point cloud directly and utilize symmetric functions to aggregate point cloud features while retaining the geometric information of the point cloud to the greatest extent. PointNet, PointNet++, and Point R-CNN are their classic architectures. Graph-based methods convert the point cloud into a graph representation and process it through the graph neural network. Point GNN and Graph R-CNN are classic architectures of this approach. Point+voxel-based methods combine the methods based on point and those based on voxel, with STD and PV R-CNN as classic architectures. In addition, to enhance the semantic information of point cloud data, researchers have used image data to supplement secondary information to design multi-modal methods. MV3D, AVOD, and MMF are classic architectures of multi-modal methods. A chronological summary of classical methods for object detection from point clouds is presented in Fig. 4. Conclusions and Prospects The field of 3D object detection from point clouds is a significant research area in computer vision that is gaining increasing attention from scholars. The foundational branch of 3D object detection from point clouds has flourished, and future research may focus on several areas. These include multi-branch and multi-mode fusion, the integration of two-dimensional detection methods, weakly supervised and self-supervised learning, and the creation and utilization of complex datasets.

KW - 3D object detection

KW - multi-modality

KW - point cloud

KW - single modality

UR - http://www.scopus.com/inward/record.url?scp=85171628459&partnerID=8YFLogxK

U2 - 10.3788/AOS230745

DO - 10.3788/AOS230745

M3 - 文献综述

AN - SCOPUS:85171628459

SN - 0253-2239

VL - 43

JO - Guangxue Xuebao/Acta Optica Sinica

JF - Guangxue Xuebao/Acta Optica Sinica

IS - 15

M1 - 1515001

ER -

基于点云数据的三维目标检测技术研究进展

Abstract

Access to Document

Other files and links

Fingerprint

Cite this

基 于 点 云 数 据 的 三 维 目 标 检 测 技 术 研 究 进 展

Abstract

Access to Document

Other files and links

Fingerprint

Cite this

基于点云数据的三维目标检测技术研究进展