Mapping methods for output-based objective speech quality assessment using data mining

Jing Wang; Sheng Hui Zhao; Xiang Xie; Jing Ming Kuang

doi:10.1007/s11771-014-2138-6

Mapping methods for output-based objective speech quality assessment using data mining

Jing Wang^*, Sheng Hui Zhao, Xiang Xie, Jing Ming Kuang

^*此作品的通讯作者

信息与电子学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

7 引用（Scopus）

摘要

Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.

源语言	英语
页（从-至）	1919-1926
页数	8
期刊	Journal of Central South University
卷	21
期	5
DOI	https://doi.org/10.1007/s11771-014-2138-6
出版状态	已出版 - 5月 2014

访问文件

10.1007/s11771-014-2138-6

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{331838fc7d8649d38a27fae648cc9c58,

title = "Mapping methods for output-based objective speech quality assessment using data mining",

abstract = "Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.",

keywords = "data mining, fuzzy neural network, multivariate non-linear regression, objective speech quality, support vector regression",

author = "Jing Wang and Zhao, {Sheng Hui} and Xiang Xie and Kuang, {Jing Ming}",

year = "2014",

month = may,

doi = "10.1007/s11771-014-2138-6",

language = "English",

volume = "21",

pages = "1919--1926",

journal = "Journal of Central South University",

issn = "2095-2899",

publisher = "Springer Science + Business Media",

number = "5",

}

TY - JOUR

T1 - Mapping methods for output-based objective speech quality assessment using data mining

AU - Wang, Jing

AU - Zhao, Sheng Hui

AU - Xie, Xiang

AU - Kuang, Jing Ming

PY - 2014/5

Y1 - 2014/5

N2 - Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.

AB - Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.

KW - data mining

KW - fuzzy neural network

KW - multivariate non-linear regression

KW - objective speech quality

KW - support vector regression

UR - http://www.scopus.com/inward/record.url?scp=84900872635&partnerID=8YFLogxK

U2 - 10.1007/s11771-014-2138-6

DO - 10.1007/s11771-014-2138-6

M3 - Article

AN - SCOPUS:84900872635

SN - 2095-2899

VL - 21

SP - 1919

EP - 1926

JO - Journal of Central South University

JF - Journal of Central South University

IS - 5

ER -

Mapping methods for output-based objective speech quality assessment using data mining

摘要

访问文件

其它文件与链接

指纹

引用此