Voxel Transformer with Shifted Windows for 3D Object Detection

Chencheng Luo; Xiangzhou Wang; Ziling Zhao; Shuhua Zheng

doi:10.1109/CAC59555.2023.10450632

Voxel Transformer with Shifted Windows for 3D Object Detection

Chencheng Luo, Xiangzhou Wang, Ziling Zhao, Shuhua Zheng^*

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Recent three-dimensional object detection methods are typically classified into point-based and voxel-based categories based on the processing method of raw point clouds. Voxel-based methods, which convert the point clouds to voxels to reduce computational load, often suffer from the geometric information loss and limited detection accuracy. In this paper, we propose a novel single-stage and voxel-based 3D object detection algorithm (VWTr) using Voxel Feature Encoder to extract features and Transformer Backbone with shifted windows to enhance the capability of feature extraction, which achieves a balance between accuracy and speed. The Transformer Backbone with shifted windows can help the network efficiently concentrate on global information and make up for the geometric information loss arose from the voxelization operation of the voxel feature encoder. To this end, we design a feature aggregation operation to enhance the network's representation capability. Relevant experiments on KITTI have demonstrated that our method has respectively reached 84.11%, 75.18%, 69.53%

源语言	英语
主期刊名	Proceedings - 2023 China Automation Congress, CAC 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	2717-2721
页数	5
ISBN（电子版）	9798350303759
DOI	https://doi.org/10.1109/CAC59555.2023.10450632
出版状态	已出版 - 2023
活动	2023 China Automation Congress, CAC 2023 - Chongqing, 中国期限: 17 11月 2023 → 19 11月 2023

出版系列

姓名	Proceedings - 2023 China Automation Congress, CAC 2023

会议

会议	2023 China Automation Congress, CAC 2023
国家/地区	中国
市	Chongqing
时期	17/11/23 → 19/11/23

访问文件

10.1109/CAC59555.2023.10450632

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{eecc32f6f8f3434e8ef24e5e2ff05505,

title = "Voxel Transformer with Shifted Windows for 3D Object Detection",

abstract = "Recent three-dimensional object detection methods are typically classified into point-based and voxel-based categories based on the processing method of raw point clouds. Voxel-based methods, which convert the point clouds to voxels to reduce computational load, often suffer from the geometric information loss and limited detection accuracy. In this paper, we propose a novel single-stage and voxel-based 3D object detection algorithm (VWTr) using Voxel Feature Encoder to extract features and Transformer Backbone with shifted windows to enhance the capability of feature extraction, which achieves a balance between accuracy and speed. The Transformer Backbone with shifted windows can help the network efficiently concentrate on global information and make up for the geometric information loss arose from the voxelization operation of the voxel feature encoder. To this end, we design a feature aggregation operation to enhance the network's representation capability. Relevant experiments on KITTI have demonstrated that our method has respectively reached 84.11%, 75.18%, 69.53%",

keywords = "3D object detection, point cloud, vision transformer",

author = "Chencheng Luo and Xiangzhou Wang and Ziling Zhao and Shuhua Zheng",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 China Automation Congress, CAC 2023 ; Conference date: 17-11-2023 Through 19-11-2023",

year = "2023",

doi = "10.1109/CAC59555.2023.10450632",

language = "English",

series = "Proceedings - 2023 China Automation Congress, CAC 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "2717--2721",

booktitle = "Proceedings - 2023 China Automation Congress, CAC 2023",

address = "United States",

}

Luo, C, Wang, X, Zhao, Z & Zheng, S 2023, Voxel Transformer with Shifted Windows for 3D Object Detection. 在 Proceedings - 2023 China Automation Congress, CAC 2023. Proceedings - 2023 China Automation Congress, CAC 2023, Institute of Electrical and Electronics Engineers Inc., 页码 2717-2721, 2023 China Automation Congress, CAC 2023, Chongqing, 中国, 17/11/23. https://doi.org/10.1109/CAC59555.2023.10450632

Voxel Transformer with Shifted Windows for 3D Object Detection. / Luo, Chencheng; Wang, Xiangzhou; Zhao, Ziling 等.
Proceedings - 2023 China Automation Congress, CAC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 2717-2721 (Proceedings - 2023 China Automation Congress, CAC 2023).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Voxel Transformer with Shifted Windows for 3D Object Detection

AU - Luo, Chencheng

AU - Wang, Xiangzhou

AU - Zhao, Ziling

AU - Zheng, Shuhua

PY - 2023

Y1 - 2023

N2 - Recent three-dimensional object detection methods are typically classified into point-based and voxel-based categories based on the processing method of raw point clouds. Voxel-based methods, which convert the point clouds to voxels to reduce computational load, often suffer from the geometric information loss and limited detection accuracy. In this paper, we propose a novel single-stage and voxel-based 3D object detection algorithm (VWTr) using Voxel Feature Encoder to extract features and Transformer Backbone with shifted windows to enhance the capability of feature extraction, which achieves a balance between accuracy and speed. The Transformer Backbone with shifted windows can help the network efficiently concentrate on global information and make up for the geometric information loss arose from the voxelization operation of the voxel feature encoder. To this end, we design a feature aggregation operation to enhance the network's representation capability. Relevant experiments on KITTI have demonstrated that our method has respectively reached 84.11%, 75.18%, 69.53%

AB - Recent three-dimensional object detection methods are typically classified into point-based and voxel-based categories based on the processing method of raw point clouds. Voxel-based methods, which convert the point clouds to voxels to reduce computational load, often suffer from the geometric information loss and limited detection accuracy. In this paper, we propose a novel single-stage and voxel-based 3D object detection algorithm (VWTr) using Voxel Feature Encoder to extract features and Transformer Backbone with shifted windows to enhance the capability of feature extraction, which achieves a balance between accuracy and speed. The Transformer Backbone with shifted windows can help the network efficiently concentrate on global information and make up for the geometric information loss arose from the voxelization operation of the voxel feature encoder. To this end, we design a feature aggregation operation to enhance the network's representation capability. Relevant experiments on KITTI have demonstrated that our method has respectively reached 84.11%, 75.18%, 69.53%

KW - 3D object detection

KW - point cloud

KW - vision transformer

UR - http://www.scopus.com/inward/record.url?scp=85189348472&partnerID=8YFLogxK

U2 - 10.1109/CAC59555.2023.10450632

DO - 10.1109/CAC59555.2023.10450632

M3 - Conference contribution

AN - SCOPUS:85189348472

T3 - Proceedings - 2023 China Automation Congress, CAC 2023

SP - 2717

EP - 2721

BT - Proceedings - 2023 China Automation Congress, CAC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 China Automation Congress, CAC 2023

Y2 - 17 November 2023 through 19 November 2023

ER -

Voxel Transformer with Shifted Windows for 3D Object Detection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此