Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects

Xinglei He; Xiaohan Zhang; Yichun Wang; Hongzeng Ji; Xiuhui Duan; Fen Guo

doi:10.3390/s22062366

Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects

Xinglei He, Xiaohan Zhang, Yichun Wang, Hongzeng Ji, Xiuhui Duan, Fen Guo^*

^*Corresponding author for this work

School of Mechanical Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Achieving the accurate perception of occluded objects for autonomous vehicles is a chal-lenging problem. Human vision can always quickly locate important object regions in complex ex-ternal scenes, while other regions are only roughly analysed or ignored, defined as the visual attention mechanism. However, the perception system of autonomous vehicles cannot know which part of the point cloud is in the region of interest. Therefore, it is meaningful to explore how to use the visual attention mechanism in the perception system of autonomous driving. In this paper, we propose the model of the spatial attention frustum to solve object occlusion in 3D object detection. The spatial attention frustum can suppress unimportant features and allocate limited neural computing resources to critical parts of the scene, thereby providing greater relevance and easier processing for higher-level perceptual reasoning tasks. To ensure that our method maintains good reasoning abil-ity when faced with occluded objects with only a partial structure, we propose a local feature ag-gregation module to capture more complex local features of the point cloud. Finally, we discuss the projection constraint relationship between the 3D bounding box and the 2D bounding box and propose a joint anchor box projection loss function, which will help to improve the overall performance of our method. The results of the KITTI dataset show that our proposed method can effectively improve the detection accuracy of occluded objects. Our method achieves 89.46%, 79.91% and 75.53% detection accuracy in the easy, moderate, and hard difficulty levels of the car category, and achieves a 6.97% performance improvement especially in the hard category with a high degree of occlusion. Our one-stage method does not need to rely on another refining stage, comparable to the accuracy of the two-stage method.

Original language	English
Article number	2366
Journal	Sensors
Volume	22
Issue number	6
DOIs	https://doi.org/10.3390/s22062366
Publication status	Published - 1 Mar 2022

Keywords

3D object detection
Autonomous vehicles
Multi-sensor fusion
Occluded object detection
Visual attention mechanism

Access to Document

10.3390/s22062366

Cite this

He, X., Zhang, X., Wang, Y., Ji, H., Duan, X., & Guo, F. (2022). Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects. Sensors, 22(6), Article 2366. https://doi.org/10.3390/s22062366

@article{18cb9b2cbd6c40f58a7e678a68fd7861,

title = "Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects",

abstract = "Achieving the accurate perception of occluded objects for autonomous vehicles is a chal-lenging problem. Human vision can always quickly locate important object regions in complex ex-ternal scenes, while other regions are only roughly analysed or ignored, defined as the visual attention mechanism. However, the perception system of autonomous vehicles cannot know which part of the point cloud is in the region of interest. Therefore, it is meaningful to explore how to use the visual attention mechanism in the perception system of autonomous driving. In this paper, we propose the model of the spatial attention frustum to solve object occlusion in 3D object detection. The spatial attention frustum can suppress unimportant features and allocate limited neural computing resources to critical parts of the scene, thereby providing greater relevance and easier processing for higher-level perceptual reasoning tasks. To ensure that our method maintains good reasoning abil-ity when faced with occluded objects with only a partial structure, we propose a local feature ag-gregation module to capture more complex local features of the point cloud. Finally, we discuss the projection constraint relationship between the 3D bounding box and the 2D bounding box and propose a joint anchor box projection loss function, which will help to improve the overall performance of our method. The results of the KITTI dataset show that our proposed method can effectively improve the detection accuracy of occluded objects. Our method achieves 89.46%, 79.91% and 75.53% detection accuracy in the easy, moderate, and hard difficulty levels of the car category, and achieves a 6.97% performance improvement especially in the hard category with a high degree of occlusion. Our one-stage method does not need to rely on another refining stage, comparable to the accuracy of the two-stage method.",

keywords = "3D object detection, Autonomous vehicles, Multi-sensor fusion, Occluded object detection, Visual attention mechanism",

author = "Xinglei He and Xiaohan Zhang and Yichun Wang and Hongzeng Ji and Xiuhui Duan and Fen Guo",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = mar,

day = "1",

doi = "10.3390/s22062366",

language = "English",

volume = "22",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "6",

}

TY - JOUR

T1 - Spatial Attention Frustum

T2 - A 3D Object Detection Method Focusing on Occluded Objects

AU - He, Xinglei

AU - Zhang, Xiaohan

AU - Wang, Yichun

AU - Ji, Hongzeng

AU - Duan, Xiuhui

AU - Guo, Fen

PY - 2022/3/1

Y1 - 2022/3/1

N2 - Achieving the accurate perception of occluded objects for autonomous vehicles is a chal-lenging problem. Human vision can always quickly locate important object regions in complex ex-ternal scenes, while other regions are only roughly analysed or ignored, defined as the visual attention mechanism. However, the perception system of autonomous vehicles cannot know which part of the point cloud is in the region of interest. Therefore, it is meaningful to explore how to use the visual attention mechanism in the perception system of autonomous driving. In this paper, we propose the model of the spatial attention frustum to solve object occlusion in 3D object detection. The spatial attention frustum can suppress unimportant features and allocate limited neural computing resources to critical parts of the scene, thereby providing greater relevance and easier processing for higher-level perceptual reasoning tasks. To ensure that our method maintains good reasoning abil-ity when faced with occluded objects with only a partial structure, we propose a local feature ag-gregation module to capture more complex local features of the point cloud. Finally, we discuss the projection constraint relationship between the 3D bounding box and the 2D bounding box and propose a joint anchor box projection loss function, which will help to improve the overall performance of our method. The results of the KITTI dataset show that our proposed method can effectively improve the detection accuracy of occluded objects. Our method achieves 89.46%, 79.91% and 75.53% detection accuracy in the easy, moderate, and hard difficulty levels of the car category, and achieves a 6.97% performance improvement especially in the hard category with a high degree of occlusion. Our one-stage method does not need to rely on another refining stage, comparable to the accuracy of the two-stage method.

AB - Achieving the accurate perception of occluded objects for autonomous vehicles is a chal-lenging problem. Human vision can always quickly locate important object regions in complex ex-ternal scenes, while other regions are only roughly analysed or ignored, defined as the visual attention mechanism. However, the perception system of autonomous vehicles cannot know which part of the point cloud is in the region of interest. Therefore, it is meaningful to explore how to use the visual attention mechanism in the perception system of autonomous driving. In this paper, we propose the model of the spatial attention frustum to solve object occlusion in 3D object detection. The spatial attention frustum can suppress unimportant features and allocate limited neural computing resources to critical parts of the scene, thereby providing greater relevance and easier processing for higher-level perceptual reasoning tasks. To ensure that our method maintains good reasoning abil-ity when faced with occluded objects with only a partial structure, we propose a local feature ag-gregation module to capture more complex local features of the point cloud. Finally, we discuss the projection constraint relationship between the 3D bounding box and the 2D bounding box and propose a joint anchor box projection loss function, which will help to improve the overall performance of our method. The results of the KITTI dataset show that our proposed method can effectively improve the detection accuracy of occluded objects. Our method achieves 89.46%, 79.91% and 75.53% detection accuracy in the easy, moderate, and hard difficulty levels of the car category, and achieves a 6.97% performance improvement especially in the hard category with a high degree of occlusion. Our one-stage method does not need to rely on another refining stage, comparable to the accuracy of the two-stage method.

KW - 3D object detection

KW - Autonomous vehicles

KW - Multi-sensor fusion

KW - Occluded object detection

KW - Visual attention mechanism

UR - http://www.scopus.com/inward/record.url?scp=85126836643&partnerID=8YFLogxK

U2 - 10.3390/s22062366

DO - 10.3390/s22062366

M3 - Article

C2 - 35336536

AN - SCOPUS:85126836643

SN - 1424-8220

VL - 22

JO - Sensors

JF - Sensors

IS - 6

M1 - 2366

ER -

Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this