Semantic Disentangling for Audiovisual Induced Emotion

Qunxi Dong; Wang Zheng; Fuze Tian; Lixian Zhu; Kun Qian; Jingyu Liu; Xuan Zhang

doi:10.1109/TCSS.2024.3450717

Semantic Disentangling for Audiovisual Induced Emotion

Qunxi Dong, Wang Zheng, Fuze Tian, Lixian Zhu, Kun Qian, Jingyu Liu^*, Xuan Zhang^*

^*Corresponding author for this work

School of Medical and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Emotions regulation play an important role in human behavior, but exhibit considerable heterogeneity among individuals, which attenuates the generalization ability of emotion models. In this work, we aim to achieve robust emotion prediction through efficient disentanglement of affective semantic representations. In detail, the data generation mechanism behind observations from different perspectives is causally set, where latent variables that relate to emotion are explicitly separate into three parts: the intrinsic-related part, the extrinsic-related part, and the spurious-related part. Affective semantic features consist of the first two parts, with the understanding that spurious latent variables generate the inherent biases in the data. Furthermore, a variational autoencoder with a reformulated objective function is proposed to learn such disentangled latent variables, and only adopts semantic representations to perform the final classification task, avoiding the interference of spurious variables. In addition, for electroencephalography (EEG) data used in this article, a space-frequency mapping method is introduced to improve information utilization. Comprehensive experiments on popular emotion datasets show that the proposed method can achieve competitive intersubject generalization performance. Our results highlight the potential of efficient latent representation disentanglement in addressing the complexity challenges of emotion recognition.

Original language	English
Journal	IEEE Transactions on Computational Social Systems
DOIs	https://doi.org/10.1109/TCSS.2024.3450717
Publication status	Accepted/In press - 2024

Keywords

Affective computing
causal model
emotion regulation (ER)
music therapy
semantic representation

Access to Document

10.1109/TCSS.2024.3450717

Cite this

Dong, Q., Zheng, W., Tian, F., Zhu, L., Qian, K., Liu, J., & Zhang, X. (Accepted/In press). Semantic Disentangling for Audiovisual Induced Emotion. IEEE Transactions on Computational Social Systems. https://doi.org/10.1109/TCSS.2024.3450717

@article{10097dc17577469cb2cdd211d090950b,

title = "Semantic Disentangling for Audiovisual Induced Emotion",

abstract = "Emotions regulation play an important role in human behavior, but exhibit considerable heterogeneity among individuals, which attenuates the generalization ability of emotion models. In this work, we aim to achieve robust emotion prediction through efficient disentanglement of affective semantic representations. In detail, the data generation mechanism behind observations from different perspectives is causally set, where latent variables that relate to emotion are explicitly separate into three parts: the intrinsic-related part, the extrinsic-related part, and the spurious-related part. Affective semantic features consist of the first two parts, with the understanding that spurious latent variables generate the inherent biases in the data. Furthermore, a variational autoencoder with a reformulated objective function is proposed to learn such disentangled latent variables, and only adopts semantic representations to perform the final classification task, avoiding the interference of spurious variables. In addition, for electroencephalography (EEG) data used in this article, a space-frequency mapping method is introduced to improve information utilization. Comprehensive experiments on popular emotion datasets show that the proposed method can achieve competitive intersubject generalization performance. Our results highlight the potential of efficient latent representation disentanglement in addressing the complexity challenges of emotion recognition.",

keywords = "Affective computing, causal model, emotion regulation (ER), music therapy, semantic representation",

author = "Qunxi Dong and Wang Zheng and Fuze Tian and Lixian Zhu and Kun Qian and Jingyu Liu and Xuan Zhang",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2024",

doi = "10.1109/TCSS.2024.3450717",

language = "English",

journal = "IEEE Transactions on Computational Social Systems",

issn = "2329-924X",

publisher = "IEEE Systems, Man, and Cybernetics Society",

}

TY - JOUR

T1 - Semantic Disentangling for Audiovisual Induced Emotion

AU - Dong, Qunxi

AU - Zheng, Wang

AU - Tian, Fuze

AU - Zhu, Lixian

AU - Qian, Kun

AU - Liu, Jingyu

AU - Zhang, Xuan

PY - 2024

Y1 - 2024

N2 - Emotions regulation play an important role in human behavior, but exhibit considerable heterogeneity among individuals, which attenuates the generalization ability of emotion models. In this work, we aim to achieve robust emotion prediction through efficient disentanglement of affective semantic representations. In detail, the data generation mechanism behind observations from different perspectives is causally set, where latent variables that relate to emotion are explicitly separate into three parts: the intrinsic-related part, the extrinsic-related part, and the spurious-related part. Affective semantic features consist of the first two parts, with the understanding that spurious latent variables generate the inherent biases in the data. Furthermore, a variational autoencoder with a reformulated objective function is proposed to learn such disentangled latent variables, and only adopts semantic representations to perform the final classification task, avoiding the interference of spurious variables. In addition, for electroencephalography (EEG) data used in this article, a space-frequency mapping method is introduced to improve information utilization. Comprehensive experiments on popular emotion datasets show that the proposed method can achieve competitive intersubject generalization performance. Our results highlight the potential of efficient latent representation disentanglement in addressing the complexity challenges of emotion recognition.

AB - Emotions regulation play an important role in human behavior, but exhibit considerable heterogeneity among individuals, which attenuates the generalization ability of emotion models. In this work, we aim to achieve robust emotion prediction through efficient disentanglement of affective semantic representations. In detail, the data generation mechanism behind observations from different perspectives is causally set, where latent variables that relate to emotion are explicitly separate into three parts: the intrinsic-related part, the extrinsic-related part, and the spurious-related part. Affective semantic features consist of the first two parts, with the understanding that spurious latent variables generate the inherent biases in the data. Furthermore, a variational autoencoder with a reformulated objective function is proposed to learn such disentangled latent variables, and only adopts semantic representations to perform the final classification task, avoiding the interference of spurious variables. In addition, for electroencephalography (EEG) data used in this article, a space-frequency mapping method is introduced to improve information utilization. Comprehensive experiments on popular emotion datasets show that the proposed method can achieve competitive intersubject generalization performance. Our results highlight the potential of efficient latent representation disentanglement in addressing the complexity challenges of emotion recognition.

KW - Affective computing

KW - causal model

KW - emotion regulation (ER)

KW - music therapy

KW - semantic representation

UR - http://www.scopus.com/inward/record.url?scp=85204557462&partnerID=8YFLogxK

U2 - 10.1109/TCSS.2024.3450717

DO - 10.1109/TCSS.2024.3450717

M3 - Article

AN - SCOPUS:85204557462

SN - 2329-924X

JO - IEEE Transactions on Computational Social Systems

JF - IEEE Transactions on Computational Social Systems

ER -

Semantic Disentangling for Audiovisual Induced Emotion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this