TY - JOUR
T1 - SD2-ReID
T2 - A semantic-stylistic decoupled distillation framework for robust multi-modal object re-identification
AU - Yan, Yonghao
AU - Gao, Meijing
AU - Bai, Yang
AU - Chen, Xu
AU - Sun, Bingzhou
AU - Sun, Huanyu
AU - Chen, Sibo
N1 - Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/6
Y1 - 2026/6
N2 - The core challenge of multi-modal object re-identification (ReID) lies in reconciling the style discrepancies across different modalities with the semantic consistency of identity. However, existing methods are difficult to effectively separate semantic features from modality-specific styles, resulting in semantic representations being contaminated by noise and affecting recognition performance. To address the above issues, we propose a multi-modal re-identification framework based on semantic-stylistic decoupled distillation, named SD2-ReID (Semantic-Stylistic Decoupled Distillation for ReID), aiming to improve modal consistency and cross-modal semantic discrimination. Firstly, we design a Hybrid Multi-modal Feature Extractor (HMFE) that employs a shared shallow structure and modality-specific deep branches to achieve fine-grained feature extraction, thereby improving learning efficiency while preserving modality-specific characteristics; secondly, we design a Decoupled Distillation Module (DDM) that explicitly separates semantic and stylistic features through dual constraints of semantic and style distillation, improving cross-modal semantic consistency and discriminative ability; finally, we propose an attention-guided masking strategy and integrate intra-modal and cross-modal contrastive learning to construct a Hierarchical Self-supervised Learning Module (HSLM), thereby enhancing the model’s robustness to local occlusions and style variations.The synergistic enhancement of semantic consistency, modal invariance and feature robustness is finally realized. Unlike existing methods, SD2-ReID does not require the design of a multi-modal fusion module and does not introduce additional overhead in the inference phase, while balancing recognition performance and inference efficiency. Experiments on three multi-modal object ReID benchmark test sets fully validate the effectiveness of our method.
AB - The core challenge of multi-modal object re-identification (ReID) lies in reconciling the style discrepancies across different modalities with the semantic consistency of identity. However, existing methods are difficult to effectively separate semantic features from modality-specific styles, resulting in semantic representations being contaminated by noise and affecting recognition performance. To address the above issues, we propose a multi-modal re-identification framework based on semantic-stylistic decoupled distillation, named SD2-ReID (Semantic-Stylistic Decoupled Distillation for ReID), aiming to improve modal consistency and cross-modal semantic discrimination. Firstly, we design a Hybrid Multi-modal Feature Extractor (HMFE) that employs a shared shallow structure and modality-specific deep branches to achieve fine-grained feature extraction, thereby improving learning efficiency while preserving modality-specific characteristics; secondly, we design a Decoupled Distillation Module (DDM) that explicitly separates semantic and stylistic features through dual constraints of semantic and style distillation, improving cross-modal semantic consistency and discriminative ability; finally, we propose an attention-guided masking strategy and integrate intra-modal and cross-modal contrastive learning to construct a Hierarchical Self-supervised Learning Module (HSLM), thereby enhancing the model’s robustness to local occlusions and style variations.The synergistic enhancement of semantic consistency, modal invariance and feature robustness is finally realized. Unlike existing methods, SD2-ReID does not require the design of a multi-modal fusion module and does not introduce additional overhead in the inference phase, while balancing recognition performance and inference efficiency. Experiments on three multi-modal object ReID benchmark test sets fully validate the effectiveness of our method.
KW - Knowledge distillation
KW - Multi-modal re-identification
KW - Self-supervised learning
KW - Semantic-stylistic decoupling
KW - Vision transformer
UR - https://www.scopus.com/pages/publications/105032208871
U2 - 10.1016/j.neunet.2026.108719
DO - 10.1016/j.neunet.2026.108719
M3 - Article
C2 - 41691980
AN - SCOPUS:105032208871
SN - 0893-6080
VL - 198
JO - Neural Networks
JF - Neural Networks
M1 - 108719
ER -