TY - JOUR
T1 - MTLFuseNet
T2 - A novel emotion recognition model based on deep latent feature fusion of EEG signals and multi-task learning
AU - Li, Rui
AU - Ren, Chao
AU - Ge, Yiqing
AU - Zhao, Qiqi
AU - Yang, Yikun
AU - Shi, Yuhan
AU - Zhang, Xiaowei
AU - Hu, Bin
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/9/27
Y1 - 2023/9/27
N2 - How to extract discriminative latent feature representations from electroencephalography (EEG) signals and build a generalized model is a topic in EEG-based emotion recognition research. This study proposed a novel emotion recognition model based on deep latent feature fusion of EEG signals and multi-task learning, referred to as MTLFuseNet. MTLFuseNet learned spatio-temporal latent features of EEG in an unsupervised manner by a variational autoencoder (VAE) and learned the spatio-spectral features of EEG in a supervised manner by a graph convolutional network (GCN) and gated recurrent unit (GRU) network. Afterward, the two latent features were fused to form more complementary and discriminative spatio-temporal–spectral fusion features for EEG signal representation. In addition, MTLFuseNet was constructed based on multi-task learning. The focal loss was introduced to solve the problem of unbalanced sample classes in an emotional dataset, and the triplet-center loss was introduced to make the fused latent feature vectors more discriminative. Finally, a subject-independent leave-one-subject-out cross-validation strategy was used to validate extensively on two public datasets, DEAP and DREAMER. On the DEAP dataset, the average accuracies of valence and arousal are 71.33% and 73.28%, respectively. On the DREAMER dataset, the average accuracies of valence and arousal are 80.43% and 83.33%, respectively. The experimental results show that the proposed MTLFuseNet model achieves excellent recognition performance, outperforming the state-of-the-art methods.
AB - How to extract discriminative latent feature representations from electroencephalography (EEG) signals and build a generalized model is a topic in EEG-based emotion recognition research. This study proposed a novel emotion recognition model based on deep latent feature fusion of EEG signals and multi-task learning, referred to as MTLFuseNet. MTLFuseNet learned spatio-temporal latent features of EEG in an unsupervised manner by a variational autoencoder (VAE) and learned the spatio-spectral features of EEG in a supervised manner by a graph convolutional network (GCN) and gated recurrent unit (GRU) network. Afterward, the two latent features were fused to form more complementary and discriminative spatio-temporal–spectral fusion features for EEG signal representation. In addition, MTLFuseNet was constructed based on multi-task learning. The focal loss was introduced to solve the problem of unbalanced sample classes in an emotional dataset, and the triplet-center loss was introduced to make the fused latent feature vectors more discriminative. Finally, a subject-independent leave-one-subject-out cross-validation strategy was used to validate extensively on two public datasets, DEAP and DREAMER. On the DEAP dataset, the average accuracies of valence and arousal are 71.33% and 73.28%, respectively. On the DREAMER dataset, the average accuracies of valence and arousal are 80.43% and 83.33%, respectively. The experimental results show that the proposed MTLFuseNet model achieves excellent recognition performance, outperforming the state-of-the-art methods.
KW - EEG
KW - Emotion recognition
KW - Feature fusion
KW - Multi-task learning
UR - http://www.scopus.com/inward/record.url?scp=85165082775&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110756
DO - 10.1016/j.knosys.2023.110756
M3 - Article
AN - SCOPUS:85165082775
SN - 0950-7051
VL - 276
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110756
ER -