Abstract
In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.
Original language | English |
---|---|
Article number | 7676312 |
Pages (from-to) | 1877-1881 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 23 |
Issue number | 12 |
DOIs | |
Publication status | Published - Dec 2016 |
Keywords
- Deep neural networks (DNNs)
- Gaussian mixture model (GMM)
- recurrent temporal restricted Boltzmann machines (RTRBMs)
- speech bandwidth extension (BWE)