TY - GEN
T1 - Unsupervised Cross-Corpus Speech Emotion Recognition Using Domain-Adaptive Subspace Learning
AU - Liu, Na
AU - Zong, Yuan
AU - Zhang, Baofeng
AU - Liu, Li
AU - Chen, Jie
AU - Zhao, Guoying
AU - Zhu, Junchao
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - In this paper, we investigate an interesting problem, i.e., unsupervised cross-corpus speech emotion recognition (SER), in which the training and testing speech signals come from two different speech emotion corpora. Meanwhile, the training speech signals are labeled, while the label information of the testing speech signals is entirely unknown. Due to this setting, the training (source) and testing (target) speech signals may have different feature distributions and therefore lots of existing SER methods would not work. To deal with this problem, we propose a domain-adaptive subspace learning (DoSL) method for learning a projection matrix with which we can transform the source and target speech signals from the original feature space to the label space. The transformed source and target speech signals in the label space would have similar feature distributions. Consequently, the classifier learned on the labeled source speech signals can effectively predict the emotional states of the unlabeled target speech signals. To evaluate the performance of the proposed DoSL method, we carry out extensive cross-corpus SER experiments on three speech emotion corpora including EmoDB, eNTERFACE, and AFEW 4.0. Compared with recent state-of-the-art cross-corpus SER methods, the proposed DoSL can achieve more satisfactory overall results.
AB - In this paper, we investigate an interesting problem, i.e., unsupervised cross-corpus speech emotion recognition (SER), in which the training and testing speech signals come from two different speech emotion corpora. Meanwhile, the training speech signals are labeled, while the label information of the testing speech signals is entirely unknown. Due to this setting, the training (source) and testing (target) speech signals may have different feature distributions and therefore lots of existing SER methods would not work. To deal with this problem, we propose a domain-adaptive subspace learning (DoSL) method for learning a projection matrix with which we can transform the source and target speech signals from the original feature space to the label space. The transformed source and target speech signals in the label space would have similar feature distributions. Consequently, the classifier learned on the labeled source speech signals can effectively predict the emotional states of the unlabeled target speech signals. To evaluate the performance of the proposed DoSL method, we carry out extensive cross-corpus SER experiments on three speech emotion corpora including EmoDB, eNTERFACE, and AFEW 4.0. Compared with recent state-of-the-art cross-corpus SER methods, the proposed DoSL can achieve more satisfactory overall results.
KW - Cross-corpus evaluation
KW - Domain adaptation
KW - Speech emotion recognition
KW - Subspace learning
UR - https://www.scopus.com/pages/publications/85054232561
U2 - 10.1109/ICASSP.2018.8461848
DO - 10.1109/ICASSP.2018.8461848
M3 - Conference contribution
AN - SCOPUS:85054232561
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5144
EP - 5148
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -