Output-based speech quality assessment using autoencoder and support vector regression

Jing Wang; Yahui Shan; Xiang Xie; Jingming Kuang

doi:10.1016/j.specom.2019.04.002

Output-based speech quality assessment using autoencoder and support vector regression

Jing Wang^*, Yahui Shan, Xiang Xie, Jingming Kuang

^*此作品的通讯作者

信息与电子学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

The output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping model from objective representation to subjective scores. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. The proposed method takes advantage of autoencoder in forming a good representation of its input which can be better mapped to Mean Opinion Score. The experimental results show that compared with the standardization ITU-T P.563 and another deep learning-based assessment method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.

源语言	英语
页（从-至）	13-20
页数	8
期刊	Speech Communication
卷	110
DOI	https://doi.org/10.1016/j.specom.2019.04.002
出版状态	已出版 - 7月 2019

访问文件

10.1016/j.specom.2019.04.002

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f06b491de4fd4d519363b65c5d1a827b,

title = "Output-based speech quality assessment using autoencoder and support vector regression",

abstract = "The output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping model from objective representation to subjective scores. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. The proposed method takes advantage of autoencoder in forming a good representation of its input which can be better mapped to Mean Opinion Score. The experimental results show that compared with the standardization ITU-T P.563 and another deep learning-based assessment method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.",

keywords = "Bottleneck feature, Speech quality assessment, Support vector regression",

author = "Jing Wang and Yahui Shan and Xiang Xie and Jingming Kuang",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier B.V.",

year = "2019",

month = jul,

doi = "10.1016/j.specom.2019.04.002",

language = "English",

volume = "110",

pages = "13--20",

journal = "Speech Communication",

issn = "0167-6393",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Output-based speech quality assessment using autoencoder and support vector regression

AU - Wang, Jing

AU - Shan, Yahui

AU - Xie, Xiang

AU - Kuang, Jingming

PY - 2019/7

Y1 - 2019/7

N2 - The output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping model from objective representation to subjective scores. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. The proposed method takes advantage of autoencoder in forming a good representation of its input which can be better mapped to Mean Opinion Score. The experimental results show that compared with the standardization ITU-T P.563 and another deep learning-based assessment method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.

AB - The output-based speech quality assessment method has been widely used and received increasing attention since it does not need undistorted signals as reference. In order to obtain a high correlation between the predicted scores and subjective results, this paper presents a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. Bottleneck features are extracted with autoencoder and support vector regression is chosen as mapping model from objective representation to subjective scores. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. The proposed method takes advantage of autoencoder in forming a good representation of its input which can be better mapped to Mean Opinion Score. The experimental results show that compared with the standardization ITU-T P.563 and another deep learning-based assessment method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.

KW - Bottleneck feature

KW - Speech quality assessment

KW - Support vector regression

UR - http://www.scopus.com/inward/record.url?scp=85064258549&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2019.04.002

DO - 10.1016/j.specom.2019.04.002

M3 - Article

AN - SCOPUS:85064258549

SN - 0167-6393

VL - 110

SP - 13

EP - 20

JO - Speech Communication

JF - Speech Communication

ER -

Output-based speech quality assessment using autoencoder and support vector regression

摘要

访问文件

其它文件与链接

指纹

引用此