Skeleton-Based Pre-Training With Discrete Labels for Emotion Recognition in IoT Environments

Zhen Zhang*, Feng Liang, Wei Wang, Runhao Zeng, Victor C.M. Leung, Xiping Hu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Self-supervised emotion recognition leveraging skeleton-based data offers a promising approach for classifying emotional expressions within the extensive amount of unlabeled data gathered by sensors in the Internet of Things (IoT). Recent advancements in this field have been driven by contrastive learning-based or generative learning-based self-supervised methods, which effectively tackle the issue of sparsely labeled data. In emotion recognition tasks, the emotional high-level semantics embedded in the skeleton data are more important than the subtle joint movements. Compared to existing methods, discrete label prediction can encourage SSL models to abstract high-level semantics in a manner similar to human perception. However, it is challenging to comprehensively capture emotional expressed in skeleton data solely from joint-based features. Moreover, emotional information conveyed through body movements may include redundant details that hinder the understanding of emotional expression. To overcome these challenges, we propose a novel discrete-label-based emotion recognition framework named the Appendage-Informed Redundancy-ignoring (AIR) discrete label framework. First, we introduce the Appendage-Skeleton Partitioning (ASP) module, which leverages limb movement data from the original skeleton to explore emotional expression. Next, we propose the Appendage-refined Multi-scale Discrete Label (AMDL) module, which transforms traditional self-supervised tasks into classification tasks. This design continuously extracts emotional semantics from skeleton data during pre-training, functioning similarly to predicting categories and subsequently classifying samples. To further reduce the nonessential information in skeleton data that may negatively impact the generation of accurate emotional categories, we propose the Appendage Label Refinement (ALR) module. It refines the generated categories by using the relationships between the skeleton and the various appendages obtained via ASP module. Finally, to maintain consistency across multiple scales, we introduce the Multi-Granularity Appendage Alignment (MGAA) method. By incorporating features from both coarse and fine scales, MGAA mitigates the encoder’s sensitivity to noise and enhances its overall robustness. We evaluate our approach on the Emilya, EGBM, and KDAE datasets, where it consistently outperforms state-of-the-art methods under various evaluation protocols.

Original languageEnglish
JournalIEEE Internet of Things Journal
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Affective Computing
  • Internet of Things (IoT)
  • Self-supervised
  • Skeleton-based Emotion Analysis

Fingerprint

Dive into the research topics of 'Skeleton-Based Pre-Training With Discrete Labels for Emotion Recognition in IoT Environments'. Together they form a unique fingerprint.

Cite this