TY - JOUR
T1 - Mapping methods for output-based objective speech quality assessment using data mining
AU - Wang, Jing
AU - Zhao, Sheng Hui
AU - Xie, Xiang
AU - Kuang, Jing Ming
PY - 2014/5
Y1 - 2014/5
N2 - Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
AB - Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
KW - data mining
KW - fuzzy neural network
KW - multivariate non-linear regression
KW - objective speech quality
KW - support vector regression
UR - http://www.scopus.com/inward/record.url?scp=84900872635&partnerID=8YFLogxK
U2 - 10.1007/s11771-014-2138-6
DO - 10.1007/s11771-014-2138-6
M3 - Article
AN - SCOPUS:84900872635
SN - 2095-2899
VL - 21
SP - 1919
EP - 1926
JO - Journal of Central South University
JF - Journal of Central South University
IS - 5
ER -