TY - JOUR
T1 - CatTrack
T2 - Single-Stage Category-Level 6D Object Pose Tracking via Convolution and Vision Transformer
AU - Yu, Sheng
AU - Zhai, Di Hua
AU - Xia, Yuanqing
AU - Li, Dong
AU - Zhao, Shiqi
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this article, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects.
AB - In the current research, many researchers have focused on instance-level pose tracking, which requires a 3D model of the object in advance, making it challenging to apply in practice. To address this limitation, some researchers have proposed the category-level object pose tracking method. Achieving accurate and speedy monocular category-level pose tracking is an essential research goal. In this article, we propose CatTrack, a new single-stage keypoints-based monocular category-level multi-object pose tracking network. A significant issue in object pose tracking tasks is utilizing the information from the previous frame to guide pose estimation for the next frame. However, as the object poses and camera information in each frame are different, we need to remove irrelevant information and emphasize useful features. To this end, we propose a transformer-based temporal information capture module to leverage the position information of keypoints from the previous frame. Furthermore, we propose a new keypoint matching module to enable the grouping and matching of object keypoints in complex scenes. We have successfully applied CatTrack to the Objectron dataset and achieved superior results in comparison to existing methods. Furthermore, we have also evaluated the generalization of CatTrack and successfully applied it to track the 6D pose of unseen real-world objects.
KW - Pose estimation
KW - pose tracking
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85162662587&partnerID=8YFLogxK
U2 - 10.1109/TMM.2023.3284598
DO - 10.1109/TMM.2023.3284598
M3 - Article
AN - SCOPUS:85162662587
SN - 1520-9210
VL - 26
SP - 1665
EP - 1680
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -