ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection

Chenguang Lu; Kang Yue; Yue Liu

doi:10.1007/978-3-031-26319-4_16

ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection

Chenguang Lu, Kang Yue, Yue Liu^*

^*此作品的通讯作者

光电学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Direct detection of 3D objects from point clouds is a challenging task due to sparsity and irregularity of point clouds. To capture point features from the raw point clouds for 3D object detection, most previous researches utilize PointNet and its variants as the feature learning backbone and have seen encouraging results. However, these methods capture point features independently without modeling the interaction between points, and simple symmetric functions cannot adequately aggregate local contextual features, which are vital for 3D object recognition. To address such limitations, we propose ReAGFormer, a reaggregation Transformer backbone with affine group features for point feature learning in 3D object detection, which can capture the dependencies between points on the aligned group feature space while retaining the flexible receptive fields. The key idea of ReAGFormer is to alleviate the perturbation of the point feature space by affine transformation and extract the dependencies between points using self-attention, while reaggregating the local point set features with the learned attention. Moreover, we also design multi-scale connections in the feature propagation layer to reduce the geometric information loss caused by point sampling and interpolation. Experimental results show that by equipping our method as the backbone for existing 3D object detectors, significant improvements and state-of-the-art performance are achieved over original models on SUN RGB-D and ScanNet V2 benchmarks.

源语言	英语
主期刊名	Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings
编辑	Lei Wang, Juergen Gall, Tat-Jun Chin, Imari Sato, Rama Chellappa
出版商	Springer Science and Business Media Deutschland GmbH
页	262-279
页数	18
ISBN（印刷版）	9783031263187
DOI	https://doi.org/10.1007/978-3-031-26319-4_16
出版状态	已出版 - 2023
活动	16th Asian Conference on Computer Vision, ACCV 2022 - Macao, 中国期限: 4 12月 2022 → 8 12月 2022

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	13841 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	16th Asian Conference on Computer Vision, ACCV 2022
国家/地区	中国
市	Macao
时期	4/12/22 → 8/12/22

访问文件

10.1007/978-3-031-26319-4_16

其它文件与链接

链接到 Scopus 的出版物

引用此

Lu, C., Yue, K., & Liu, Y. (2023). ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection. 在 L. Wang, J. Gall, T.-J. Chin, I. Sato, & R. Chellappa (编辑), Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings (页码 262-279). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 13841 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26319-4_16

Lu, Chenguang ; Yue, Kang ; Liu, Yue. / ReAGFormer : Reaggregation Transformer with Affine Group Features for 3D Object Detection. Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings. 编辑 / Lei Wang ; Juergen Gall ; Tat-Jun Chin ; Imari Sato ; Rama Chellappa. Springer Science and Business Media Deutschland GmbH, 2023. 页码 262-279 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{7863fd5fd6d34f79a88fe21daf527ffd,

title = "ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection",

abstract = "Direct detection of 3D objects from point clouds is a challenging task due to sparsity and irregularity of point clouds. To capture point features from the raw point clouds for 3D object detection, most previous researches utilize PointNet and its variants as the feature learning backbone and have seen encouraging results. However, these methods capture point features independently without modeling the interaction between points, and simple symmetric functions cannot adequately aggregate local contextual features, which are vital for 3D object recognition. To address such limitations, we propose ReAGFormer, a reaggregation Transformer backbone with affine group features for point feature learning in 3D object detection, which can capture the dependencies between points on the aligned group feature space while retaining the flexible receptive fields. The key idea of ReAGFormer is to alleviate the perturbation of the point feature space by affine transformation and extract the dependencies between points using self-attention, while reaggregating the local point set features with the learned attention. Moreover, we also design multi-scale connections in the feature propagation layer to reduce the geometric information loss caused by point sampling and interpolation. Experimental results show that by equipping our method as the backbone for existing 3D object detectors, significant improvements and state-of-the-art performance are achieved over original models on SUN RGB-D and ScanNet V2 benchmarks.",

keywords = "3D object detection, Point cloud, Transformer",

author = "Chenguang Lu and Kang Yue and Yue Liu",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 16th Asian Conference on Computer Vision, ACCV 2022 ; Conference date: 04-12-2022 Through 08-12-2022",

year = "2023",

doi = "10.1007/978-3-031-26319-4_16",

language = "English",

isbn = "9783031263187",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "262--279",

editor = "Lei Wang and Juergen Gall and Tat-Jun Chin and Imari Sato and Rama Chellappa",

booktitle = "Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings",

address = "Germany",

}

Lu, C, Yue, K & Liu, Y 2023, ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection. 在 L Wang, J Gall, T-J Chin, I Sato & R Chellappa (编辑), Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 13841 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 262-279, 16th Asian Conference on Computer Vision, ACCV 2022, Macao, 中国, 4/12/22. https://doi.org/10.1007/978-3-031-26319-4_16

ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection. / Lu, Chenguang; Yue, Kang; Liu, Yue.
Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings. 编辑 / Lei Wang; Juergen Gall; Tat-Jun Chin; Imari Sato; Rama Chellappa. Springer Science and Business Media Deutschland GmbH, 2023. 页码 262-279 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 13841 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - ReAGFormer

T2 - 16th Asian Conference on Computer Vision, ACCV 2022

AU - Lu, Chenguang

AU - Yue, Kang

AU - Liu, Yue

PY - 2023

Y1 - 2023

N2 - Direct detection of 3D objects from point clouds is a challenging task due to sparsity and irregularity of point clouds. To capture point features from the raw point clouds for 3D object detection, most previous researches utilize PointNet and its variants as the feature learning backbone and have seen encouraging results. However, these methods capture point features independently without modeling the interaction between points, and simple symmetric functions cannot adequately aggregate local contextual features, which are vital for 3D object recognition. To address such limitations, we propose ReAGFormer, a reaggregation Transformer backbone with affine group features for point feature learning in 3D object detection, which can capture the dependencies between points on the aligned group feature space while retaining the flexible receptive fields. The key idea of ReAGFormer is to alleviate the perturbation of the point feature space by affine transformation and extract the dependencies between points using self-attention, while reaggregating the local point set features with the learned attention. Moreover, we also design multi-scale connections in the feature propagation layer to reduce the geometric information loss caused by point sampling and interpolation. Experimental results show that by equipping our method as the backbone for existing 3D object detectors, significant improvements and state-of-the-art performance are achieved over original models on SUN RGB-D and ScanNet V2 benchmarks.

AB - Direct detection of 3D objects from point clouds is a challenging task due to sparsity and irregularity of point clouds. To capture point features from the raw point clouds for 3D object detection, most previous researches utilize PointNet and its variants as the feature learning backbone and have seen encouraging results. However, these methods capture point features independently without modeling the interaction between points, and simple symmetric functions cannot adequately aggregate local contextual features, which are vital for 3D object recognition. To address such limitations, we propose ReAGFormer, a reaggregation Transformer backbone with affine group features for point feature learning in 3D object detection, which can capture the dependencies between points on the aligned group feature space while retaining the flexible receptive fields. The key idea of ReAGFormer is to alleviate the perturbation of the point feature space by affine transformation and extract the dependencies between points using self-attention, while reaggregating the local point set features with the learned attention. Moreover, we also design multi-scale connections in the feature propagation layer to reduce the geometric information loss caused by point sampling and interpolation. Experimental results show that by equipping our method as the backbone for existing 3D object detectors, significant improvements and state-of-the-art performance are achieved over original models on SUN RGB-D and ScanNet V2 benchmarks.

KW - 3D object detection

KW - Point cloud

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85151059036&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-26319-4_16

DO - 10.1007/978-3-031-26319-4_16

M3 - Conference contribution

AN - SCOPUS:85151059036

SN - 9783031263187

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 262

EP - 279

BT - Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings

A2 - Wang, Lei

A2 - Gall, Juergen

A2 - Chin, Tat-Jun

A2 - Sato, Imari

A2 - Chellappa, Rama

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 4 December 2022 through 8 December 2022

ER -

Lu C, Yue K, Liu Y. ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection. 在 Wang L, Gall J, Chin TJ, Sato I, Chellappa R, 编辑, Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. 页码 262-279. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-26319-4_16

ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此