TY - JOUR
T1 - Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines
AU - Wang, Yingxue
AU - Zhao, Shenghui
AU - Li, Jianxin
AU - Kuang, Jingming
N1 - Publisher Copyright:
© 1994-2012 IEEE.
PY - 2016/12
Y1 - 2016/12
N2 - In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.
AB - In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.
KW - Deep neural networks (DNNs)
KW - Gaussian mixture model (GMM)
KW - recurrent temporal restricted Boltzmann machines (RTRBMs)
KW - speech bandwidth extension (BWE)
UR - http://www.scopus.com/inward/record.url?scp=85006059009&partnerID=8YFLogxK
U2 - 10.1109/LSP.2016.2621053
DO - 10.1109/LSP.2016.2621053
M3 - Article
AN - SCOPUS:85006059009
SN - 1070-9908
VL - 23
SP - 1877
EP - 1881
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
IS - 12
M1 - 7676312
ER -