TY - JOUR
T1 - Semantic Disentangling for Audiovisual Induced Emotion
AU - Dong, Qunxi
AU - Zheng, Wang
AU - Tian, Fuze
AU - Zhu, Lixian
AU - Qian, Kun
AU - Liu, Jingyu
AU - Zhang, Xuan
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Emotions regulation play an important role in human behavior, but exhibit considerable heterogeneity among individuals, which attenuates the generalization ability of emotion models. In this work, we aim to achieve robust emotion prediction through efficient disentanglement of affective semantic representations. In detail, the data generation mechanism behind observations from different perspectives is causally set, where latent variables that relate to emotion are explicitly separate into three parts: the intrinsic-related part, the extrinsic-related part, and the spurious-related part. Affective semantic features consist of the first two parts, with the understanding that spurious latent variables generate the inherent biases in the data. Furthermore, a variational autoencoder with a reformulated objective function is proposed to learn such disentangled latent variables, and only adopts semantic representations to perform the final classification task, avoiding the interference of spurious variables. In addition, for electroencephalography (EEG) data used in this article, a space-frequency mapping method is introduced to improve information utilization. Comprehensive experiments on popular emotion datasets show that the proposed method can achieve competitive intersubject generalization performance. Our results highlight the potential of efficient latent representation disentanglement in addressing the complexity challenges of emotion recognition.
AB - Emotions regulation play an important role in human behavior, but exhibit considerable heterogeneity among individuals, which attenuates the generalization ability of emotion models. In this work, we aim to achieve robust emotion prediction through efficient disentanglement of affective semantic representations. In detail, the data generation mechanism behind observations from different perspectives is causally set, where latent variables that relate to emotion are explicitly separate into three parts: the intrinsic-related part, the extrinsic-related part, and the spurious-related part. Affective semantic features consist of the first two parts, with the understanding that spurious latent variables generate the inherent biases in the data. Furthermore, a variational autoencoder with a reformulated objective function is proposed to learn such disentangled latent variables, and only adopts semantic representations to perform the final classification task, avoiding the interference of spurious variables. In addition, for electroencephalography (EEG) data used in this article, a space-frequency mapping method is introduced to improve information utilization. Comprehensive experiments on popular emotion datasets show that the proposed method can achieve competitive intersubject generalization performance. Our results highlight the potential of efficient latent representation disentanglement in addressing the complexity challenges of emotion recognition.
KW - Affective computing
KW - causal model
KW - emotion regulation (ER)
KW - music therapy
KW - semantic representation
UR - http://www.scopus.com/inward/record.url?scp=85204557462&partnerID=8YFLogxK
U2 - 10.1109/TCSS.2024.3450717
DO - 10.1109/TCSS.2024.3450717
M3 - Article
AN - SCOPUS:85204557462
SN - 2329-924X
JO - IEEE Transactions on Computational Social Systems
JF - IEEE Transactions on Computational Social Systems
ER -