CatFormer: Category-Level 6D Object Pose Estimation with Transformer

Sheng Yu; Di Hua Zhai; Yuanqing Xia

doi:10.1609/aaai.v38i7.28505

CatFormer: Category-Level 6D Object Pose Estimation with Transformer

Sheng Yu, Di Hua Zhai^*, Yuanqing Xia

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.

Original language	English
Title of host publication	Technical Tracks 14
Editors	Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
Publisher	Association for the Advancement of Artificial Intelligence
Pages	6808-6816
Number of pages	9
Edition	7
ISBN (Electronic)	1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOIs	https://doi.org/10.1609/aaai.v38i7.28505
Publication status	Published - 25 Mar 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024

Publication series

Name	Proceedings of the AAAI Conference on Artificial Intelligence
Number	7
Volume	38
ISSN (Print)	2159-5399
ISSN (Electronic)	2374-3468

Conference

Conference	38th AAAI Conference on Artificial Intelligence, AAAI 2024
Country/Territory	Canada
City	Vancouver
Period	20/02/24 → 27/02/24

Access to Document

10.1609/aaai.v38i7.28505

Cite this

Yu, S., Zhai, D. H., & Xia, Y. (2024). CatFormer: Category-Level 6D Object Pose Estimation with Transformer. In M. Wooldridge, J. Dy, & S. Natarajan (Eds.), Technical Tracks 14 (7 ed., pp. 6808-6816). (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38, No. 7). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i7.28505

@inproceedings{1dbb44ad0ca540cc9f4a0211eb45ecc5,

title = "CatFormer: Category-Level 6D Object Pose Estimation with Transformer",

abstract = "Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.",

author = "Sheng Yu and Zhai, {Di Hua} and Yuanqing Xia",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i7.28505",

language = "English",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "7",

pages = "6808--6816",

editor = "Michael Wooldridge and Jennifer Dy and Sriraam Natarajan",

booktitle = "Technical Tracks 14",

edition = "7",

}

Yu, S, Zhai, DH & Xia, Y 2024, CatFormer: Category-Level 6D Object Pose Estimation with Transformer. in M Wooldridge, J Dy & S Natarajan (eds), Technical Tracks 14. 7 edn, Proceedings of the AAAI Conference on Artificial Intelligence, no. 7, vol. 38, Association for the Advancement of Artificial Intelligence, pp. 6808-6816, 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, Canada, 20/02/24. https://doi.org/10.1609/aaai.v38i7.28505

CatFormer: Category-Level 6D Object Pose Estimation with Transformer. / Yu, Sheng; Zhai, Di Hua ; Xia, Yuanqing.
Technical Tracks 14. ed. / Michael Wooldridge; Jennifer Dy; Sriraam Natarajan. 7. ed. Association for the Advancement of Artificial Intelligence, 2024. p. 6808-6816 (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38, No. 7).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - CatFormer

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

AU - Yu, Sheng

AU - Zhai, Di Hua

AU - Xia, Yuanqing

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.

AB - Although there has been significant progress in category-level object pose estimation in recent years, there is still considerable room for improvement. In this paper, we propose a novel transformer-based category-level 6D pose estimation method called CatFormer to enhance the accuracy pose estimation. CatFormer comprises three main parts: a coarse deformation part, a fine deformation part, and a recurrent refinement part. In the coarse and fine deformation sections, we introduce a transformer-based deformation module that performs point cloud deformation and completion in the feature space. Additionally, after each deformation, we incorporate a transformer-based graph module to adjust fused features and establish geometric and topological relationships between points based on these features. Furthermore, we present an end-to-end recurrent refinement module that enables the prior point cloud to deform multiple times according to real scene features. We evaluate CatFormer's performance by training and testing it on CAMERA25 and REAL275 datasets. Experimental results demonstrate that CatFormer surpasses state-of-the-art methods. Moreover, we extend the usage of CatFormer to instance-level object pose estimation on the LINEMOD dataset, as well as object pose estimation in real-world scenarios. The experimental results validate the effectiveness and generalization capabilities of CatFormer. Our code and the supplemental materials are avaliable at https://github.com/BIT-robot-group/CatFormer.

UR - http://www.scopus.com/inward/record.url?scp=85189538338&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i7.28505

DO - 10.1609/aaai.v38i7.28505

M3 - Conference contribution

AN - SCOPUS:85189538338

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 6808

EP - 6816

BT - Technical Tracks 14

A2 - Wooldridge, Michael

A2 - Dy, Jennifer

A2 - Natarajan, Sriraam

PB - Association for the Advancement of Artificial Intelligence

Y2 - 20 February 2024 through 27 February 2024

ER -

CatFormer: Category-Level 6D Object Pose Estimation with Transformer

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this