TY - GEN
T1 - Recurrent neural network for spectral mapping in speech bandwidth extension
AU - Wang, Yingxue
AU - Zhao, Shenghui
AU - Li, Jianxin
AU - Kuang, Jingming
AU - Zhu, Qiang
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/4/19
Y1 - 2017/4/19
N2 - We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.
AB - We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.
KW - Feedforward neural network
KW - Gaussian mixture model
KW - Recurrent neural network
KW - Recurrent temporal restricted Boltzmann machine
KW - Speech bandwidth extension
UR - http://www.scopus.com/inward/record.url?scp=85043695997&partnerID=8YFLogxK
U2 - 10.1109/GlobalSIP.2016.7905840
DO - 10.1109/GlobalSIP.2016.7905840
M3 - Conference contribution
AN - SCOPUS:85043695997
T3 - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
SP - 242
EP - 246
BT - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016
Y2 - 7 December 2016 through 9 December 2016
ER -