Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations

Shuang Liang, Xiang Xie*, Qingran Zhan, Hao Cheng

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Speech emotion recognition (SER) is a challenging problem due to the insufficient dataset. This paper deals with this problem from two aspects. First, we exploit two levels of speech representations for SER task, one for automatic speech recognition (ASR)-based representations and the other for phonological knowledge representations. Second, we use transfer learning, pre-train models and transfer knowledge from other large corpus for none-SER task. In our system, the whole model is divided into two parts: two-representation learning module and SER module. We fuse acoustic features with ASR-based and phonological knowledge representations which are both extracted from pre-trained models, and the fusion features are used in SER training. Then a novel multi-task learning approach is proposed where a shared encoder-multi decoder model is used for the phonological knowledge representation learning. The Conformer structure is introduced for the SER task, and our study indicates that Conformer is effective for SER. Finally, experimental results on IEMOCAP show that the proposed method can achieve 77.35 weighted accuracy and 77.99 unweighted accuracy respectively.

源语言英语
主期刊名ICIAI 2022 - 6th International Conference on Innovation in Artificial Intelligence
出版商Association for Computing Machinery
216-220
页数5
ISBN(电子版)9781450395502
DOI
出版状态已出版 - 4 3月 2022
活动6th International Conference on Innovation in Artificial Intelligence, ICIAI 2022 - Virtual, Online, 中国
期限: 4 3月 20226 3月 2022

出版系列

姓名ACM International Conference Proceeding Series

会议

会议6th International Conference on Innovation in Artificial Intelligence, ICIAI 2022
国家/地区中国
Virtual, Online
时期4/03/226/03/22

指纹

探究 'Speech Emotion Recognition Exploiting ASR-based and Phonological Knowledge Representations' 的科研主题。它们共同构成独一无二的指纹。

引用此