Abstract
Emotion recognition is of great importance for human-computer interaction. Emotion recognition technology based on physiological signals has shown great potential because of its strong objectivity and real-time capability. One of the most challenging tasks in this field is how to better fuse multi-source signals to extract information as comprehensively as possible. We propose a new framework for multi-source signal fusion and emotion recognition to address key challenges in feature alignment and representation learning. First, to reduce the distance between multi-source homogeneous signals in the feature space, we design a novel Contrastive Pairs AutoEncoder (CPAE), which is for feature alignment before aggregating the signals obtained from the Dual-LSTM. We also propose a designed cross-modal frequency module (CMF-Module), using a multi-layer perceptron (MLP) to learn the real and imaginary components of the signal's frequency representation, which integrates Resblock to achieve dual-channel time-domain and frequency-domain feature extraction. Furthermore, we incorporate the hidden ordinal relationships among emotional categories into the feature space through regression loss, and constrain the feature distribution using the Wasserstein distance. Experiments on public datasets show the best performance of our proposed method by comparing with baselines. We also conduct ablation studies to better verify the effect of the proposed method.
Original language | English |
---|---|
Journal | IEEE Journal of Biomedical and Health Informatics |
DOIs | |
Publication status | Accepted/In press - 2025 |
Externally published | Yes |
Keywords
- affective computing
- emotion recognition
- IoMT
- multi-modal data fusion
- multi-modal sentiment analysis