MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds

Shaocong Dong, Lihe Ding, Haiyang Wang, Tingfa Xu*, Xinli Xu, Ziyang Bian, Ying Wang, Jie Wang, Jianan Li*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

22 引用 (Scopus)

摘要

3D object detection from the LiDAR point cloud is fundamental to autonomous driving. Large-scale outdoor scenes usually feature significant variance in instance scales, thus requiring features rich in long-range and fine-grained information to support accurate detection. Recent detectors leverage the power of window-based transformers to model long-range dependencies but tend to blur out fine-grained details. To mitigate this gap, we present a novel Mixed-scale Sparse Voxel Transformer, named MsSVT, which can well capture both types of information simultaneously by the divide-and-conquer philosophy. Specifically, MsSVT explicitly divides attention heads into multiple groups, each in charge of attending to information within a particular range. All groups' output is merged to obtain the final mixed-scale features. Moreover, we provide a novel chessboard sampling strategy to reduce the computational complexity of applying a window-based transformer in 3D voxel space. To improve efficiency, we also implement the voxel sampling and gathering operations sparsely with a hash map. Endowed by the powerful capability and high efficiency of modeling mixed-scale information, our single-stage detector built on top of MsSVT surprisingly outperforms state-of-the-art two-stage detectors on Waymo. Our project page: https://github.com/dscdyc/MsSVT.

源语言英语
主期刊名Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
编辑S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
出版商Neural information processing systems foundation
ISBN(电子版)9781713871088
出版状态已出版 - 2022
活动36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, 美国
期限: 28 11月 20229 12月 2022

出版系列

姓名Advances in Neural Information Processing Systems
35
ISSN(印刷版)1049-5258

会议

会议36th Conference on Neural Information Processing Systems, NeurIPS 2022
国家/地区美国
New Orleans
时期28/11/229/12/22

指纹

探究 'MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds' 的科研主题。它们共同构成独一无二的指纹。

引用此