TY - JOUR
T1 - Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction
AU - Chen, Luefeng
AU - Su, Wanjuan
AU - Feng, Yu
AU - Wu, Min
AU - She, Jinhua
AU - Hirota, Kaoru
N1 - Publisher Copyright:
© 2019
PY - 2020/1
Y1 - 2020/1
N2 - The two-layer fuzzy multiple random forest (TLFMRF) is proposed for speech emotion recognition. When recognizing speech emotion, there are usually some problems. One is that feature extraction relies on personalized features. The other is that emotion recognition doesn't consider the differences among different categories of people. In the proposal, personalized and non-personalized features are fused for speech emotion recognition. High dimensional emotional features are divided into different subclasses by adopting the fuzzy C-means clustering algorithm, and multiple random forest is used to recognize different emotional states. Finally, a TLFMRF is established. Moreover, a separate classification of certain emotions which are difficult to recognize to some extent is conducted. The results show that the TLFMRF can identify emotions in a stable manner. To demonstrate the effectiveness of the proposal, experiments on CASIA corpus and Berlin EmoDB are conducted. Experimental results show the recognition accuracies of the proposal are 1.39%–7.64% and 4.06%–4.30% higher than that of back propagation neural network and random forest respectively. Meanwhile, preliminary application experiments are also conducted to investigate the emotional social robot system, and application results indicate that mobile robot can real-time track six basic emotions, including angry, fear, happy, neutral, sad, and surprise.
AB - The two-layer fuzzy multiple random forest (TLFMRF) is proposed for speech emotion recognition. When recognizing speech emotion, there are usually some problems. One is that feature extraction relies on personalized features. The other is that emotion recognition doesn't consider the differences among different categories of people. In the proposal, personalized and non-personalized features are fused for speech emotion recognition. High dimensional emotional features are divided into different subclasses by adopting the fuzzy C-means clustering algorithm, and multiple random forest is used to recognize different emotional states. Finally, a TLFMRF is established. Moreover, a separate classification of certain emotions which are difficult to recognize to some extent is conducted. The results show that the TLFMRF can identify emotions in a stable manner. To demonstrate the effectiveness of the proposal, experiments on CASIA corpus and Berlin EmoDB are conducted. Experimental results show the recognition accuracies of the proposal are 1.39%–7.64% and 4.06%–4.30% higher than that of back propagation neural network and random forest respectively. Meanwhile, preliminary application experiments are also conducted to investigate the emotional social robot system, and application results indicate that mobile robot can real-time track six basic emotions, including angry, fear, happy, neutral, sad, and surprise.
KW - Fuzzy C-means
KW - Human-robot interaction
KW - Multiple random forest
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85071968961&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2019.09.005
DO - 10.1016/j.ins.2019.09.005
M3 - Article
AN - SCOPUS:85071968961
SN - 0020-0255
VL - 509
SP - 150
EP - 163
JO - Information Sciences
JF - Information Sciences
ER -