TY - JOUR
T1 - Coupled Multimodal Emotional Feature Analysis Based on Broad-Deep Fusion Networks in Human–Robot Interaction
AU - Chen, Luefeng
AU - Li, Min
AU - Wu, Min
AU - Pedrycz, Witold
AU - Hirota, Kaoru
N1 - Publisher Copyright:
IEEE
PY - 2023
Y1 - 2023
N2 - A coupled multimodal emotional feature analysis (CMEFA) method based on broad–deep fusion networks, which divide multimodal emotion recognition into two layers, is proposed. First, facial emotional features and gesture emotional features are extracted using the broad and deep learning fusion network (BDFN). Considering that the bi-modal emotion is not completely independent of each other, canonical correlation analysis (CCA) is used to analyze and extract the correlation between the emotion features, and a coupling network is established for emotion recognition of the extracted bi-modal features. Both simulation and application experiments are completed. According to the simulation experiments completed on the bimodal face and body gesture database (FABO), the recognition rate of the proposed method has increased by 1.15% compared to that of the support vector machine recursive feature elimination (SVMRFE) (without considering the unbalanced contribution of features). Moreover, by using the proposed method, the multimodal recognition rate is 21.22%, 2.65%, 1.61%, 1.54%, and 0.20% higher than those of the fuzzy deep neural network with sparse autoencoder (FDNNSA), ResNet-101 $+$ GFK, C3D $+$ MCB $+$ DBN, the hierarchical classification fusion strategy (HCFS), and cross-channel convolutional neural network (CCCNN), respectively. In addition, preliminary application experiments are carried out on our developed emotional social robot system, where emotional robot recognizes the emotions of eight volunteers based on their facial expressions and body gestures.
AB - A coupled multimodal emotional feature analysis (CMEFA) method based on broad–deep fusion networks, which divide multimodal emotion recognition into two layers, is proposed. First, facial emotional features and gesture emotional features are extracted using the broad and deep learning fusion network (BDFN). Considering that the bi-modal emotion is not completely independent of each other, canonical correlation analysis (CCA) is used to analyze and extract the correlation between the emotion features, and a coupling network is established for emotion recognition of the extracted bi-modal features. Both simulation and application experiments are completed. According to the simulation experiments completed on the bimodal face and body gesture database (FABO), the recognition rate of the proposed method has increased by 1.15% compared to that of the support vector machine recursive feature elimination (SVMRFE) (without considering the unbalanced contribution of features). Moreover, by using the proposed method, the multimodal recognition rate is 21.22%, 2.65%, 1.61%, 1.54%, and 0.20% higher than those of the fuzzy deep neural network with sparse autoencoder (FDNNSA), ResNet-101 $+$ GFK, C3D $+$ MCB $+$ DBN, the hierarchical classification fusion strategy (HCFS), and cross-channel convolutional neural network (CCCNN), respectively. In addition, preliminary application experiments are carried out on our developed emotional social robot system, where emotional robot recognizes the emotions of eight volunteers based on their facial expressions and body gestures.
KW - Broad learning
KW - Convolution
KW - Convolutional neural networks
KW - Correlation
KW - Emotion recognition
KW - Feature extraction
KW - Kernel
KW - Neural networks
KW - deep feature fusion
KW - deep neural networks
KW - human–robot interaction
KW - multimodal emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85147283251&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2023.3236320
DO - 10.1109/TNNLS.2023.3236320
M3 - Article
AN - SCOPUS:85147283251
SN - 2162-237X
SP - 1
EP - 11
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -