Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction

Luefeng Chen; Wanjuan Su; Yu Feng; Min Wu; Jinhua She; Kaoru Hirota

doi:10.1016/j.ins.2019.09.005

Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction

Luefeng Chen, Wanjuan Su, Yu Feng, Min Wu^*, Jinhua She, Kaoru Hirota

^*Corresponding author for this work

School of Automation

Research output: Contribution to journal › Article › peer-review

149 Citations (Scopus)

Abstract

The two-layer fuzzy multiple random forest (TLFMRF) is proposed for speech emotion recognition. When recognizing speech emotion, there are usually some problems. One is that feature extraction relies on personalized features. The other is that emotion recognition doesn't consider the differences among different categories of people. In the proposal, personalized and non-personalized features are fused for speech emotion recognition. High dimensional emotional features are divided into different subclasses by adopting the fuzzy C-means clustering algorithm, and multiple random forest is used to recognize different emotional states. Finally, a TLFMRF is established. Moreover, a separate classification of certain emotions which are difficult to recognize to some extent is conducted. The results show that the TLFMRF can identify emotions in a stable manner. To demonstrate the effectiveness of the proposal, experiments on CASIA corpus and Berlin EmoDB are conducted. Experimental results show the recognition accuracies of the proposal are 1.39%–7.64% and 4.06%–4.30% higher than that of back propagation neural network and random forest respectively. Meanwhile, preliminary application experiments are also conducted to investigate the emotional social robot system, and application results indicate that mobile robot can real-time track six basic emotions, including angry, fear, happy, neutral, sad, and surprise.

Original language	English
Pages (from-to)	150-163
Number of pages	14
Journal	Information Sciences
Volume	509
DOIs	https://doi.org/10.1016/j.ins.2019.09.005
Publication status	Published - Jan 2020

Keywords

Fuzzy C-means
Human-robot interaction
Multiple random forest
Speech emotion recognition

Access to Document

10.1016/j.ins.2019.09.005

Cite this

Chen, L., Su, W., Feng, Y., Wu, M., She, J., & Hirota, K. (2020). Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction. Information Sciences, 509, 150-163. https://doi.org/10.1016/j.ins.2019.09.005

@article{9f3112160efa4f22becbcc967af2690d,

title = "Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction",

abstract = "The two-layer fuzzy multiple random forest (TLFMRF) is proposed for speech emotion recognition. When recognizing speech emotion, there are usually some problems. One is that feature extraction relies on personalized features. The other is that emotion recognition doesn't consider the differences among different categories of people. In the proposal, personalized and non-personalized features are fused for speech emotion recognition. High dimensional emotional features are divided into different subclasses by adopting the fuzzy C-means clustering algorithm, and multiple random forest is used to recognize different emotional states. Finally, a TLFMRF is established. Moreover, a separate classification of certain emotions which are difficult to recognize to some extent is conducted. The results show that the TLFMRF can identify emotions in a stable manner. To demonstrate the effectiveness of the proposal, experiments on CASIA corpus and Berlin EmoDB are conducted. Experimental results show the recognition accuracies of the proposal are 1.39%–7.64% and 4.06%–4.30% higher than that of back propagation neural network and random forest respectively. Meanwhile, preliminary application experiments are also conducted to investigate the emotional social robot system, and application results indicate that mobile robot can real-time track six basic emotions, including angry, fear, happy, neutral, sad, and surprise.",

keywords = "Fuzzy C-means, Human-robot interaction, Multiple random forest, Speech emotion recognition",

author = "Luefeng Chen and Wanjuan Su and Yu Feng and Min Wu and Jinhua She and Kaoru Hirota",

note = "Publisher Copyright: {\textcopyright} 2019",

year = "2020",

month = jan,

doi = "10.1016/j.ins.2019.09.005",

language = "English",

volume = "509",

pages = "150--163",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction

AU - Chen, Luefeng

AU - Su, Wanjuan

AU - Feng, Yu

AU - Wu, Min

AU - She, Jinhua

AU - Hirota, Kaoru

PY - 2020/1

Y1 - 2020/1

N2 - The two-layer fuzzy multiple random forest (TLFMRF) is proposed for speech emotion recognition. When recognizing speech emotion, there are usually some problems. One is that feature extraction relies on personalized features. The other is that emotion recognition doesn't consider the differences among different categories of people. In the proposal, personalized and non-personalized features are fused for speech emotion recognition. High dimensional emotional features are divided into different subclasses by adopting the fuzzy C-means clustering algorithm, and multiple random forest is used to recognize different emotional states. Finally, a TLFMRF is established. Moreover, a separate classification of certain emotions which are difficult to recognize to some extent is conducted. The results show that the TLFMRF can identify emotions in a stable manner. To demonstrate the effectiveness of the proposal, experiments on CASIA corpus and Berlin EmoDB are conducted. Experimental results show the recognition accuracies of the proposal are 1.39%–7.64% and 4.06%–4.30% higher than that of back propagation neural network and random forest respectively. Meanwhile, preliminary application experiments are also conducted to investigate the emotional social robot system, and application results indicate that mobile robot can real-time track six basic emotions, including angry, fear, happy, neutral, sad, and surprise.

AB - The two-layer fuzzy multiple random forest (TLFMRF) is proposed for speech emotion recognition. When recognizing speech emotion, there are usually some problems. One is that feature extraction relies on personalized features. The other is that emotion recognition doesn't consider the differences among different categories of people. In the proposal, personalized and non-personalized features are fused for speech emotion recognition. High dimensional emotional features are divided into different subclasses by adopting the fuzzy C-means clustering algorithm, and multiple random forest is used to recognize different emotional states. Finally, a TLFMRF is established. Moreover, a separate classification of certain emotions which are difficult to recognize to some extent is conducted. The results show that the TLFMRF can identify emotions in a stable manner. To demonstrate the effectiveness of the proposal, experiments on CASIA corpus and Berlin EmoDB are conducted. Experimental results show the recognition accuracies of the proposal are 1.39%–7.64% and 4.06%–4.30% higher than that of back propagation neural network and random forest respectively. Meanwhile, preliminary application experiments are also conducted to investigate the emotional social robot system, and application results indicate that mobile robot can real-time track six basic emotions, including angry, fear, happy, neutral, sad, and surprise.

KW - Fuzzy C-means

KW - Human-robot interaction

KW - Multiple random forest

KW - Speech emotion recognition

UR - http://www.scopus.com/inward/record.url?scp=85071968961&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2019.09.005

DO - 10.1016/j.ins.2019.09.005

M3 - Article

AN - SCOPUS:85071968961

SN - 0020-0255

VL - 509

SP - 150

EP - 163

JO - Information Sciences

JF - Information Sciences

ER -

Two-layer fuzzy multiple random forest for speech emotion recognition in human-robot interaction

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this