Chinese speech emotion recognition based on bidirectional long short-term memory network

Hu Du, Kaoru Hirota*, Yaping Dai, Donggyun Kim, Junjie Ma

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

A Bidirectional Long Short-Term Memory(BLSTM) network is applied to improve the accuracy of Chinese speech emotion recognition of six basic human emotions (angry, fear, happy, neutral, sad, and surprise). The features of emotions can be learned and saved by BLSTM network whose special architecture called memory blocks is used to remember information from a long sentence, and BLSTM network provides information both from history and future of the current frame for the importance of the context of sentences. Results of experiments on the CASIA Chinese emotion corpus show that the average recognition accuracy reaches 73.83%, and has a 9.83% increase compared with the method based on information cell, 7.83% increase compared with Mel Frequency Cepstrum Coefficient and Principal Component Analysis, and 24.83% increase compared with Random Deep Belief Networks.

Original languageEnglish
Publication statusPublished - 2017
Event5th International Workshop on Advanced Computational Intelligence and Intelligent Informatics, IWACIII 2017 - Beijing, China
Duration: 2 Nov 20175 Nov 2017

Conference

Conference5th International Workshop on Advanced Computational Intelligence and Intelligent Informatics, IWACIII 2017
Country/TerritoryChina
CityBeijing
Period2/11/175/11/17

Keywords

  • Bidirectional long short-term memory
  • CASIA Chinese corpus
  • Chinese speech emotion recognition

Fingerprint

Dive into the research topics of 'Chinese speech emotion recognition based on bidirectional long short-term memory network'. Together they form a unique fingerprint.

Cite this