TY - GEN
T1 - Deep Residual Network with D-S Evidence Theory for Bimodal Emotion Recognition
AU - Liu, Yulong
AU - Chen, Luefeng
AU - Li, Min
AU - Wu, Min
AU - Pedrycz, Witold
AU - Hirota, Kaoru
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - In this paper, the Deep Residual Network (ResNet) with Dempster-Shafer (D-S) evidence theory is presented for bimodal emotion recognition through applying facial expression and speech emotion information. By acquiring discriminative emotion features and performing bimodal fusion of emotions, this method can overcome the limitations of single modal emotion recognition and obtain higher recognition accuracy. The key areas of emotional features and spectrograms are firstly used to acquire low-level characteristics of emotion. Moreover, two ResNets are designed to select high-level emotion semantic features. Furthermore, under the structure of D-S evidence theory, the output probability values are used for achieving emotion fusion to improve the effectiveness of bimodal emotion recognition. The experimental studies on the eNTERFACE'05 database demonstrate a recognition accuracy of 88.67%, which is a noteworthy improvement of 23.11% and 9.32% compared to an individual mode of facial expressions and speech, respectively.
AB - In this paper, the Deep Residual Network (ResNet) with Dempster-Shafer (D-S) evidence theory is presented for bimodal emotion recognition through applying facial expression and speech emotion information. By acquiring discriminative emotion features and performing bimodal fusion of emotions, this method can overcome the limitations of single modal emotion recognition and obtain higher recognition accuracy. The key areas of emotional features and spectrograms are firstly used to acquire low-level characteristics of emotion. Moreover, two ResNets are designed to select high-level emotion semantic features. Furthermore, under the structure of D-S evidence theory, the output probability values are used for achieving emotion fusion to improve the effectiveness of bimodal emotion recognition. The experimental studies on the eNTERFACE'05 database demonstrate a recognition accuracy of 88.67%, which is a noteworthy improvement of 23.11% and 9.32% compared to an individual mode of facial expressions and speech, respectively.
KW - Bimodal emotion recognition
KW - D-S evidence theory
KW - Deep Residual Network
UR - https://www.scopus.com/pages/publications/85128069955
U2 - 10.1109/CAC53003.2021.9727443
DO - 10.1109/CAC53003.2021.9727443
M3 - Conference contribution
AN - SCOPUS:85128069955
T3 - Proceeding - 2021 China Automation Congress, CAC 2021
SP - 4674
EP - 4679
BT - Proceeding - 2021 China Automation Congress, CAC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 China Automation Congress, CAC 2021
Y2 - 22 October 2021 through 24 October 2021
ER -