FI-Net: A Speech Emotion Recognition Framework with Feature Integration and Data Augmentation

Guangmin Xia, Fan Li*, Dongdi Zhao, Qian Zhang, Song Yang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Speech emotion recognition, as an important auxiliary component of speech interaction technology, has always been a research hotspot. In this work, we propose a novel framework for speech emotion recognition based on deep neural network. The proposed framework is composed of two main modules: a local feature extractor module that utilizes deep recurrent layers to extract frame-level feature representations and a global feature integration module that learns utterance-level representations for emotion recognition. Two architectures, one multi-granularity convolutional layer and one multi-scale attentive layer are constructed for the feature integration module. Furthermore, we adopt two data augmentation approaches, noise injection and vocal tract length perturbation which both improve the performance and robustness of models and reduce the influence of individual variations. The proposed models achieve recognition accuracies of 92.08% and 90.41% on Emo-DB and CASIA dataset, respectively. In addition, ablation experiments are conducted to show the effectiveness of the proposed feature integration module and data augmentation approaches.

Original languageEnglish
Title of host publicationProceedings - 5th International Conference on Big Data Computing and Communications, BIGCOM 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages195-203
Number of pages9
ISBN (Electronic)9781728140247
DOIs
Publication statusPublished - Aug 2019
Event5th International Conference on Big Data Computing and Communications, BIGCOM 2019 - Qingdao, China
Duration: 9 Aug 201911 Aug 2019

Publication series

NameProceedings - 5th International Conference on Big Data Computing and Communications, BIGCOM 2019

Conference

Conference5th International Conference on Big Data Computing and Communications, BIGCOM 2019
Country/TerritoryChina
CityQingdao
Period9/08/1911/08/19

Keywords

  • data augmentation
  • feature integration
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'FI-Net: A Speech Emotion Recognition Framework with Feature Integration and Data Augmentation'. Together they form a unique fingerprint.

Cite this