TY - JOUR
T1 - BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications
AU - Liu, Miao
AU - Wang, Jing
AU - Xu, Liang
AU - Zhang, Jianqian
AU - Li, Shicong
AU - Xiang, Fei
N1 - Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - This paper presents the details of the BIT-MI deep learning-based model submitted to the ConferencingSpeech challenge 2022. Due to the large time and labor costs of subjective tests, the challenge aims to promote the non-intrusive objective quality assessment research for speech communication and targets for effective evaluation on the speech quality of online conferencing applications. We propose a novel deep learning-based model involving a new convolution neural network (CNN) architecture, a bidirectional long short term memory (BLSTM), an average pooling and a range clipping method. Meanwhile, we construct a two-parts target function combining the mean square error (MSE) and pearson correlation coefficient (PCC) between predictions and labels in order to jointly optimize the performance of the assessment model from both aspects. Experiment results show that the proposed model significantly outperforms the official baseline system both on the validation and test set.
AB - This paper presents the details of the BIT-MI deep learning-based model submitted to the ConferencingSpeech challenge 2022. Due to the large time and labor costs of subjective tests, the challenge aims to promote the non-intrusive objective quality assessment research for speech communication and targets for effective evaluation on the speech quality of online conferencing applications. We propose a novel deep learning-based model involving a new convolution neural network (CNN) architecture, a bidirectional long short term memory (BLSTM), an average pooling and a range clipping method. Meanwhile, we construct a two-parts target function combining the mean square error (MSE) and pearson correlation coefficient (PCC) between predictions and labels in order to jointly optimize the performance of the assessment model from both aspects. Experiment results show that the proposed model significantly outperforms the official baseline system both on the validation and test set.
KW - deep learning
KW - speech quality assessment
KW - target function
UR - http://www.scopus.com/inward/record.url?scp=85140081752&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-10010
DO - 10.21437/Interspeech.2022-10010
M3 - Conference article
AN - SCOPUS:85140081752
SN - 2308-457X
VL - 2022-September
SP - 3288
EP - 3292
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -