BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications

Miao Liu, Jing Wang, Liang Xu, Jianqian Zhang, Shicong Li, Fei Xiang

Research output: Contribution to journalConference articlepeer-review

6 Citations (Scopus)

Abstract

This paper presents the details of the BIT-MI deep learning-based model submitted to the ConferencingSpeech challenge 2022. Due to the large time and labor costs of subjective tests, the challenge aims to promote the non-intrusive objective quality assessment research for speech communication and targets for effective evaluation on the speech quality of online conferencing applications. We propose a novel deep learning-based model involving a new convolution neural network (CNN) architecture, a bidirectional long short term memory (BLSTM), an average pooling and a range clipping method. Meanwhile, we construct a two-parts target function combining the mean square error (MSE) and pearson correlation coefficient (PCC) between predictions and labels in order to jointly optimize the performance of the assessment model from both aspects. Experiment results show that the proposed model significantly outperforms the official baseline system both on the validation and test set.

Original languageEnglish
Pages (from-to)3288-3292
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2022-September
DOIs
Publication statusPublished - 2022
Event23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022

Keywords

  • deep learning
  • speech quality assessment
  • target function

Fingerprint

Dive into the research topics of 'BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications'. Together they form a unique fingerprint.

Cite this