Voxel Transformer with Shifted Windows for 3D Object Detection

Chencheng Luo, Xiangzhou Wang, Ziling Zhao, Shuhua Zheng*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Recent three-dimensional object detection methods are typically classified into point-based and voxel-based categories based on the processing method of raw point clouds. Voxel-based methods, which convert the point clouds to voxels to reduce computational load, often suffer from the geometric information loss and limited detection accuracy. In this paper, we propose a novel single-stage and voxel-based 3D object detection algorithm (VWTr) using Voxel Feature Encoder to extract features and Transformer Backbone with shifted windows to enhance the capability of feature extraction, which achieves a balance between accuracy and speed. The Transformer Backbone with shifted windows can help the network efficiently concentrate on global information and make up for the geometric information loss arose from the voxelization operation of the voxel feature encoder. To this end, we design a feature aggregation operation to enhance the network's representation capability. Relevant experiments on KITTI have demonstrated that our method has respectively reached 84.11%, 75.18%, 69.53%

Original languageEnglish
Title of host publicationProceedings - 2023 China Automation Congress, CAC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2717-2721
Number of pages5
ISBN (Electronic)9798350303759
DOIs
Publication statusPublished - 2023
Event2023 China Automation Congress, CAC 2023 - Chongqing, China
Duration: 17 Nov 202319 Nov 2023

Publication series

NameProceedings - 2023 China Automation Congress, CAC 2023

Conference

Conference2023 China Automation Congress, CAC 2023
Country/TerritoryChina
CityChongqing
Period17/11/2319/11/23

Keywords

  • 3D object detection
  • point cloud
  • vision transformer

Fingerprint

Dive into the research topics of 'Voxel Transformer with Shifted Windows for 3D Object Detection'. Together they form a unique fingerprint.

Cite this