TY - GEN
T1 - Non-intrusive speech quality assessment using deep belief network and backpropagation neural network
AU - Shan, Yahui
AU - Wang, Jing
AU - Xie, Xiang
AU - Meng, Liuchen
AU - Kuang, Jingming
N1 - Publisher Copyright:
� 2018 IEEE
PY - 2018/7/2
Y1 - 2018/7/2
N2 - In this paper, we present a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. The traditional non-intrusive assessment methods cannot meet the requirement of high consistency with subjective results owing to the lack of original reference signals. To solve these issues, deep belief network is trained to produce pseudo-reference speech signal of degraded speech. Then mel-frequency cepstrum coefficients of pseudo-reference speech and degraded speech are extracted to calculate feature differences. The feature differences are mapped to speech quality score using backpropagation neural network. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. When compared with the standardization ITU-T P.563, Gaussian Mixture Model method and the autoencoder-based method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.
AB - In this paper, we present a new speech quality assessment method to estimate the quality of degraded speech without the reference speech. The traditional non-intrusive assessment methods cannot meet the requirement of high consistency with subjective results owing to the lack of original reference signals. To solve these issues, deep belief network is trained to produce pseudo-reference speech signal of degraded speech. Then mel-frequency cepstrum coefficients of pseudo-reference speech and degraded speech are extracted to calculate feature differences. The feature differences are mapped to speech quality score using backpropagation neural network. Experiments are conducted in a dataset containing various degraded speech signals and subjective listening scores. When compared with the standardization ITU-T P.563, Gaussian Mixture Model method and the autoencoder-based method, the proposed method brings about a higher correlation coefficient between predicted scores and subjective scores.
KW - Backpropagation neural network
KW - Deep belief network
KW - Mel-frequency cepstrum coefficients
KW - Non-intrusive speech quality assessment
UR - http://www.scopus.com/inward/record.url?scp=85065881483&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2018.8706696
DO - 10.1109/ISCSLP.2018.8706696
M3 - Conference contribution
AN - SCOPUS:85065881483
T3 - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
SP - 71
EP - 75
BT - 2018 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th International Symposium on Chinese Spoken Language Processing, ISCSLP 2018
Y2 - 26 November 2018 through 29 November 2018
ER -