DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds

Yaqian Ning; Jie Cao; Chun Bao; Qun Hao

doi:10.3390/rs15235612

DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds

Yaqian Ning, Jie Cao^*, Chun Bao, Qun Hao

^*此作品的通讯作者

光电学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods.

源语言	英语
文章编号	5612
期刊	Remote Sensing
卷	15
期	23
DOI	https://doi.org/10.3390/rs15235612
出版状态	已出版 - 12月 2023

访问文件

10.3390/rs15235612

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e4ae2c535b02414bb8c01871e46e9d90,

title = "DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds",

abstract = "The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods.",

keywords = "3D object detection, deformable mechanism, point clouds, transformer",

author = "Yaqian Ning and Jie Cao and Chun Bao and Qun Hao",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = dec,

doi = "10.3390/rs15235612",

language = "English",

volume = "15",

journal = "Remote Sensing",

issn = "2072-4292",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "23",

}

TY - JOUR

T1 - DVST

T2 - Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds

AU - Ning, Yaqian

AU - Cao, Jie

AU - Bao, Chun

AU - Hao, Qun

PY - 2023/12

Y1 - 2023/12

N2 - The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods.

AB - The use of a transformer backbone in LiDAR point-cloud-based models for 3D object detection has recently gained significant interest. The larger receptive field of the transformer backbone improves its representation capability but also results in excessive attention being given to background regions. To solve this problem, we propose a novel approach called deformable voxel set attention, which we utilized to create a deformable voxel set transformer (DVST) backbone for 3D object detection from point clouds. The DVST aims to efficaciously integrate the flexible receptive field of the deformable mechanism and the powerful context modeling capability of the transformer. Specifically, we introduce the deformable mechanism into voxel-based set attention to selectively transfer candidate keys and values of foreground queries to important regions. An offset generation module was designed to learn the offsets of the foreground queries. Furthermore, a globally responsive convolutional feed-forward network with residual connection is presented to capture global feature interactions in hidden space. We verified the validity of the DVST on the KITTI and Waymo open datasets by constructing single-stage and two-stage models. The findings indicated that the DVST enhanced the average precision of the baseline model while preserving computational efficiency, achieving a performance comparable to state-of-the-art methods.

KW - 3D object detection

KW - deformable mechanism

KW - point clouds

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85178899137&partnerID=8YFLogxK

U2 - 10.3390/rs15235612

DO - 10.3390/rs15235612

M3 - Article

AN - SCOPUS:85178899137

SN - 2072-4292

VL - 15

JO - Remote Sensing

JF - Remote Sensing

IS - 23

M1 - 5612

ER -

DVST: Deformable Voxel Set Transformer for 3D Object Detection from Point Clouds

摘要

访问文件

其它文件与链接

指纹

引用此