Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Yingxue Wang, Shenghui Zhao*, Jianxin Li, Jingming Kuang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

14 引用 (Scopus)

摘要

In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

源语言英语
文章编号7676312
页(从-至)1877-1881
页数5
期刊IEEE Signal Processing Letters
23
12
DOI
出版状态已出版 - 12月 2016

指纹

探究 'Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines' 的科研主题。它们共同构成独一无二的指纹。

引用此