Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations

Shuang Liang, Xiang Xie*, Qingran Zhan, Hao Cheng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech emotion recognition (SER) is a challenging problem due to the insufficient dataset. This paper deals with this problem from two aspects. First, we exploit two levels of speech representations for SER task, one for automatic speech recognition (ASR)-based representations and the other for phonological knowledge representations. Second, we use transfer learning, pre-train models and transfer knowledge from other large corpus for none-SER task. In our system, the whole model is divided into two parts: two-representation learning module and SER module. We fuse acoustic features with ASR-based and phonological knowledge representations which are both extracted from pre-trained models, and the fusion features are used in SER training. Then a novel multi-task learning approach is proposed where a shared encoder-multi decoder model is used for the phonological knowledge representation learning. The Conformer structure is introduced for the SER task, and our study indicates that Conformer is effective for SER. Finally, experimental results on IEMOCAP show that the proposed method can achieve 77.35 weighted accuracy and 77.99 unweighted accuracy respectively.

Original languageEnglish
Title of host publicationICIAI 2022 - 6th International Conference on Innovation in Artificial Intelligence
PublisherAssociation for Computing Machinery
Pages216-220
Number of pages5
ISBN (Electronic)9781450395502
DOIs
Publication statusPublished - 4 Mar 2022
Event6th International Conference on Innovation in Artificial Intelligence, ICIAI 2022 - Virtual, Online, China
Duration: 4 Mar 20226 Mar 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference6th International Conference on Innovation in Artificial Intelligence, ICIAI 2022
Country/TerritoryChina
CityVirtual, Online
Period4/03/226/03/22

Keywords

  • Multi-task learning
  • Speech emotion recognition
  • Transfer learning

Fingerprint

Dive into the research topics of 'Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations'. Together they form a unique fingerprint.

Cite this