TY - JOUR
T1 - Skeleton-Based Pre-Training With Discrete Labels for Emotion Recognition in IoT Environments
AU - Zhang, Zhen
AU - Liang, Feng
AU - Wang, Wei
AU - Zeng, Runhao
AU - Leung, Victor C.M.
AU - Hu, Xiping
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2025
Y1 - 2025
N2 - Self-supervised emotion recognition leveraging skeleton-based data offers a promising approach for classifying emotional expressions within the extensive amount of unlabeled data gathered by sensors in the Internet of Things (IoT). Recent advancements in this field have been driven by contrastive learning-based or generative learning-based self-supervised methods, which effectively tackle the issue of sparsely labeled data. In emotion recognition tasks, the emotional high-level semantics embedded in the skeleton data are more important than the subtle joint movements. Compared to existing methods, discrete label prediction can encourage SSL models to abstract high-level semantics in a manner similar to human perception. However, it is challenging to comprehensively capture emotional expressed in skeleton data solely from joint-based features. Moreover, emotional information conveyed through body movements may include redundant details that hinder the understanding of emotional expression. To overcome these challenges, we propose a novel discrete-label-based emotion recognition framework named the Appendage-Informed Redundancy-ignoring (AIR) discrete label framework. First, we introduce the Appendage-Skeleton Partitioning (ASP) module, which leverages limb movement data from the original skeleton to explore emotional expression. Next, we propose the Appendage-refined Multi-scale Discrete Label (AMDL) module, which transforms traditional self-supervised tasks into classification tasks. This design continuously extracts emotional semantics from skeleton data during pre-training, functioning similarly to predicting categories and subsequently classifying samples. To further reduce the nonessential information in skeleton data that may negatively impact the generation of accurate emotional categories, we propose the Appendage Label Refinement (ALR) module. It refines the generated categories by using the relationships between the skeleton and the various appendages obtained via ASP module. Finally, to maintain consistency across multiple scales, we introduce the Multi-Granularity Appendage Alignment (MGAA) method. By incorporating features from both coarse and fine scales, MGAA mitigates the encoder’s sensitivity to noise and enhances its overall robustness. We evaluate our approach on the Emilya, EGBM, and KDAE datasets, where it consistently outperforms state-of-the-art methods under various evaluation protocols.
AB - Self-supervised emotion recognition leveraging skeleton-based data offers a promising approach for classifying emotional expressions within the extensive amount of unlabeled data gathered by sensors in the Internet of Things (IoT). Recent advancements in this field have been driven by contrastive learning-based or generative learning-based self-supervised methods, which effectively tackle the issue of sparsely labeled data. In emotion recognition tasks, the emotional high-level semantics embedded in the skeleton data are more important than the subtle joint movements. Compared to existing methods, discrete label prediction can encourage SSL models to abstract high-level semantics in a manner similar to human perception. However, it is challenging to comprehensively capture emotional expressed in skeleton data solely from joint-based features. Moreover, emotional information conveyed through body movements may include redundant details that hinder the understanding of emotional expression. To overcome these challenges, we propose a novel discrete-label-based emotion recognition framework named the Appendage-Informed Redundancy-ignoring (AIR) discrete label framework. First, we introduce the Appendage-Skeleton Partitioning (ASP) module, which leverages limb movement data from the original skeleton to explore emotional expression. Next, we propose the Appendage-refined Multi-scale Discrete Label (AMDL) module, which transforms traditional self-supervised tasks into classification tasks. This design continuously extracts emotional semantics from skeleton data during pre-training, functioning similarly to predicting categories and subsequently classifying samples. To further reduce the nonessential information in skeleton data that may negatively impact the generation of accurate emotional categories, we propose the Appendage Label Refinement (ALR) module. It refines the generated categories by using the relationships between the skeleton and the various appendages obtained via ASP module. Finally, to maintain consistency across multiple scales, we introduce the Multi-Granularity Appendage Alignment (MGAA) method. By incorporating features from both coarse and fine scales, MGAA mitigates the encoder’s sensitivity to noise and enhances its overall robustness. We evaluate our approach on the Emilya, EGBM, and KDAE datasets, where it consistently outperforms state-of-the-art methods under various evaluation protocols.
KW - Affective Computing
KW - Internet of Things (IoT)
KW - Self-supervised
KW - Skeleton-based Emotion Analysis
UR - http://www.scopus.com/inward/record.url?scp=105006919055&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2025.3574456
DO - 10.1109/JIOT.2025.3574456
M3 - Article
AN - SCOPUS:105006919055
SN - 2327-4662
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
ER -