Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Yingxue Wang; Shenghui Zhao; Jianxin Li; Jingming Kuang

doi:10.1109/LSP.2016.2621053

Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Yingxue Wang, Shenghui Zhao^*, Jianxin Li, Jingming Kuang

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

14 引用（Scopus）

摘要

In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

源语言	英语
文章编号	7676312
页（从-至）	1877-1881
页数	5
期刊	IEEE Signal Processing Letters
卷	23
期	12
DOI	https://doi.org/10.1109/LSP.2016.2621053
出版状态	已出版 - 12月 2016

访问文件

10.1109/LSP.2016.2621053

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, Y., Zhao, S., Li, J., & Kuang, J. (2016). Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines. IEEE Signal Processing Letters, 23(12), 1877-1881. 文章 7676312. https://doi.org/10.1109/LSP.2016.2621053

@article{7883e1862dff45c78809fd0591375922,

title = "Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines",

abstract = "In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.",

keywords = "Deep neural networks (DNNs), Gaussian mixture model (GMM), recurrent temporal restricted Boltzmann machines (RTRBMs), speech bandwidth extension (BWE)",

author = "Yingxue Wang and Shenghui Zhao and Jianxin Li and Jingming Kuang",

note = "Publisher Copyright: {\textcopyright} 1994-2012 IEEE.",

year = "2016",

month = dec,

doi = "10.1109/LSP.2016.2621053",

language = "English",

volume = "23",

pages = "1877--1881",

journal = "IEEE Signal Processing Letters",

issn = "1070-9908",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

AU - Wang, Yingxue

AU - Zhao, Shenghui

AU - Li, Jianxin

AU - Kuang, Jingming

PY - 2016/12

Y1 - 2016/12

N2 - In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

AB - In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

KW - Deep neural networks (DNNs)

KW - Gaussian mixture model (GMM)

KW - recurrent temporal restricted Boltzmann machines (RTRBMs)

KW - speech bandwidth extension (BWE)

UR - http://www.scopus.com/inward/record.url?scp=85006059009&partnerID=8YFLogxK

U2 - 10.1109/LSP.2016.2621053

DO - 10.1109/LSP.2016.2621053

M3 - Article

AN - SCOPUS:85006059009

SN - 1070-9908

VL - 23

SP - 1877

EP - 1881

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

IS - 12

M1 - 7676312

ER -

Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

摘要

访问文件

其它文件与链接

指纹

引用此