Abstract
These days, physiological signals have been studied more broadly for emotion recognition to realize emotional intelligence in human-computer interaction. However, due to the complexity of emotions and individual differences in physiological responses, how to design reliable and effective models has become an important issue. In this article, we propose a regularized deep fusion framework for emotion recognition based on multimodal physiological signals. After extracting the effective features from different types of physiological signals, we construct ensemble dense embeddings of multimodal features using kernel matrices, and then utilize a deep network architecture to learn task-specific representations for each kind of physiological signal from these ensemble dense embeddings. Finally, a global fusion layer with a regularization term, which can efficiently explore the correlation and diversity among all of the representations in a synchronous optimization process, is designed to fuse generated representations. Experiments on two benchmark datasets show that this framework can improve the performance of subject-independent emotion recognition compared to single-modal classifiers or other fusion methods. Data visualization also demonstrates that the final fusion representation exhibits higher class-separability power for emotion recognition.
Original language | English |
---|---|
Pages (from-to) | 4386-4399 |
Number of pages | 14 |
Journal | IEEE Transactions on Cybernetics |
Volume | 51 |
Issue number | 9 |
DOIs | |
Publication status | Published - Sept 2021 |
Externally published | Yes |
Keywords
- Deep neural network
- emotion recognition
- kernel machine
- multimodal fusion