Abstract
A coupled multimodal emotional feature analysis (CMEFA) method based on broad–deep fusion networks, which divide multimodal emotion recognition into two layers, is proposed. First, facial emotional features and gesture emotional features are extracted using the broad and deep learning fusion network (BDFN). Considering that the bi-modal emotion is not completely independent of each other, canonical correlation analysis (CCA) is used to analyze and extract the correlation between the emotion features, and a coupling network is established for emotion recognition of the extracted bi-modal features. Both simulation and application experiments are completed. According to the simulation experiments completed on the bimodal face and body gesture database (FABO), the recognition rate of the proposed method has increased by 1.15% compared to that of the support vector machine recursive feature elimination (SVMRFE) (without considering the unbalanced contribution of features). Moreover, by using the proposed method, the multimodal recognition rate is 21.22%, 2.65%, 1.61%, 1.54%, and 0.20% higher than those of the fuzzy deep neural network with sparse autoencoder (FDNNSA), ResNet-101 <inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula> GFK, C3D <inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula> MCB <inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula> DBN, the hierarchical classification fusion strategy (HCFS), and cross-channel convolutional neural network (CCCNN), respectively. In addition, preliminary application experiments are carried out on our developed emotional social robot system, where emotional robot recognizes the emotions of eight volunteers based on their facial expressions and body gestures.
Original language | English |
---|---|
Pages (from-to) | 1-11 |
Number of pages | 11 |
Journal | IEEE Transactions on Neural Networks and Learning Systems |
DOIs | |
Publication status | Accepted/In press - 2023 |
Externally published | Yes |
Keywords
- Broad learning
- Convolution
- Convolutional neural networks
- Correlation
- Emotion recognition
- Feature extraction
- Kernel
- Neural networks
- deep feature fusion
- deep neural networks
- human–robot interaction
- multimodal emotion recognition