TY - GEN
T1 - DecTrans
T2 - 6th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2023
AU - Zhang, Yan
AU - Gao, Guangyu
AU - Wang, Qianxiang
AU - Ge, Jing
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2024.
PY - 2024
Y1 - 2024
N2 - Utilizing part-level features provides a more detailed representation, leading to improved results in person re-identification (ReID). Yet existing works either use external tasks like pose estimation or struggle to define part features, which limit the model’s learning capability. In this work, we propose the Decomposed Transformer (DecTrans), a transformer-based person ReID framework which exploits multifaceted part features. In particular, DecTrans extracts local features using the Vision Transformer (ViT) and then maps them into latent parts through a novel Token Decomposition (TD) layer. In the TD layer, soft clustering of ViT tokens forms clusters, and each token is decomposed into components based on its similarity to all cluster centroids. Token components referencing the same cluster are then regrouped to produce part features, thereby retaining more feature details. To ensure tokens from different pedestrians but referring to the same part are sufficiently clustered together, we propose to remove id information from tokens before clustering. Besides, we also propose a simple yet efficient data augmentation named Image Graying, which has been experimentally validated when used in conjunction with the TD layer. The DecTrans achieves remarkable performance, e.g., mAP and Rank1 of 70.8 % & 87.1 %, and 61.6 % & 67.7 % on MSMT17 and Occluded-Duke, significantly outperforming state-of-the-arts.
AB - Utilizing part-level features provides a more detailed representation, leading to improved results in person re-identification (ReID). Yet existing works either use external tasks like pose estimation or struggle to define part features, which limit the model’s learning capability. In this work, we propose the Decomposed Transformer (DecTrans), a transformer-based person ReID framework which exploits multifaceted part features. In particular, DecTrans extracts local features using the Vision Transformer (ViT) and then maps them into latent parts through a novel Token Decomposition (TD) layer. In the TD layer, soft clustering of ViT tokens forms clusters, and each token is decomposed into components based on its similarity to all cluster centroids. Token components referencing the same cluster are then regrouped to produce part features, thereby retaining more feature details. To ensure tokens from different pedestrians but referring to the same part are sufficiently clustered together, we propose to remove id information from tokens before clustering. Besides, we also propose a simple yet efficient data augmentation named Image Graying, which has been experimentally validated when used in conjunction with the TD layer. The DecTrans achieves remarkable performance, e.g., mAP and Rank1 of 70.8 % & 87.1 %, and 61.6 % & 67.7 % on MSMT17 and Occluded-Duke, significantly outperforming state-of-the-arts.
KW - Person ReID
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85181978100&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-8555-5_3
DO - 10.1007/978-981-99-8555-5_3
M3 - Conference contribution
AN - SCOPUS:85181978100
SN - 9789819985548
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 29
EP - 42
BT - Pattern Recognition and Computer Vision - 6th Chinese Conference, PRCV 2023, Proceedings
A2 - Liu, Qingshan
A2 - Wang, Hanzi
A2 - Ji, Rongrong
A2 - Ma, Zhanyu
A2 - Zheng, Weishi
A2 - Zha, Hongbin
A2 - Chen, Xilin
A2 - Wang, Liang
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 13 October 2023 through 15 October 2023
ER -