Speech bandwidth expansion based on deep neural networks

Yingxue Wang; Shenghui Zhao; Wenbo Liu; Ming Li; Jingming Kuang

Speech bandwidth expansion based on deep neural networks

Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, Jingming Kuang

信息与电子学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

38 引用（Scopus）

摘要

This paper proposes a new speech bandwidth expansion method, which uses Deep Neural Networks (DNNs) to build high-order eigenspaces between the low frequency components and the high frequency components of the speech signal. A four-layer DNN is trained layer-by-layer from a cascade of Neural Networks (NNs) and two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs). The GBRBMs are adopted to model the distribution of spectral envelopes of the low frequency and the high frequency respectively. The NNs are used to model the joint distribution of hidden variables extracted from the two GBRBMs. The proposed method takes advantage of the strong modeling ability of GBRBMs in modeling the distribution of the spectral envelopes. And both the objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

源语言	英语
页（从-至）	2593-2597
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2015-January
出版状态	已出版 - 2015
活动	16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, 德国期限: 6 9月 2015 → 10 9月 2015

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{0960775a07384e0cbb0cbcbf4ad1b97b,

title = "Speech bandwidth expansion based on deep neural networks",

abstract = "This paper proposes a new speech bandwidth expansion method, which uses Deep Neural Networks (DNNs) to build high-order eigenspaces between the low frequency components and the high frequency components of the speech signal. A four-layer DNN is trained layer-by-layer from a cascade of Neural Networks (NNs) and two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs). The GBRBMs are adopted to model the distribution of spectral envelopes of the low frequency and the high frequency respectively. The NNs are used to model the joint distribution of hidden variables extracted from the two GBRBMs. The proposed method takes advantage of the strong modeling ability of GBRBMs in modeling the distribution of the spectral envelopes. And both the objective and subjective test results show that the proposed method outperforms the conventional GMM based method.",

keywords = "Bandwidth extension, Deep neural networks, Gaussian-Bernoulli Restricted Boltzmann Machine, Neural networks",

author = "Yingxue Wang and Shenghui Zhao and Wenbo Liu and Ming Li and Jingming Kuang",

note = "Publisher Copyright: Copyright {\textcopyright} 2015 ISCA.; 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 ; Conference date: 06-09-2015 Through 10-09-2015",

year = "2015",

language = "English",

volume = "2015-January",

pages = "2593--2597",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Speech bandwidth expansion based on deep neural networks

AU - Wang, Yingxue

AU - Zhao, Shenghui

AU - Liu, Wenbo

AU - Li, Ming

AU - Kuang, Jingming

PY - 2015

Y1 - 2015

N2 - This paper proposes a new speech bandwidth expansion method, which uses Deep Neural Networks (DNNs) to build high-order eigenspaces between the low frequency components and the high frequency components of the speech signal. A four-layer DNN is trained layer-by-layer from a cascade of Neural Networks (NNs) and two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs). The GBRBMs are adopted to model the distribution of spectral envelopes of the low frequency and the high frequency respectively. The NNs are used to model the joint distribution of hidden variables extracted from the two GBRBMs. The proposed method takes advantage of the strong modeling ability of GBRBMs in modeling the distribution of the spectral envelopes. And both the objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

AB - This paper proposes a new speech bandwidth expansion method, which uses Deep Neural Networks (DNNs) to build high-order eigenspaces between the low frequency components and the high frequency components of the speech signal. A four-layer DNN is trained layer-by-layer from a cascade of Neural Networks (NNs) and two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs). The GBRBMs are adopted to model the distribution of spectral envelopes of the low frequency and the high frequency respectively. The NNs are used to model the joint distribution of hidden variables extracted from the two GBRBMs. The proposed method takes advantage of the strong modeling ability of GBRBMs in modeling the distribution of the spectral envelopes. And both the objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

KW - Bandwidth extension

KW - Deep neural networks

KW - Gaussian-Bernoulli Restricted Boltzmann Machine

KW - Neural networks

UR - http://www.scopus.com/inward/record.url?scp=84959151466&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84959151466

SN - 2308-457X

VL - 2015-January

SP - 2593

EP - 2597

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015

Y2 - 6 September 2015 through 10 September 2015

ER -

Speech bandwidth expansion based on deep neural networks

摘要

其它文件与链接

指纹

引用此