Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Yingxue Wang, Shenghui Zhao*, Jianxin Li, Jingming Kuang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)

Abstract

In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

Original languageEnglish
Article number7676312
Pages (from-to)1877-1881
Number of pages5
JournalIEEE Signal Processing Letters
Volume23
Issue number12
DOIs
Publication statusPublished - Dec 2016

Keywords

  • Deep neural networks (DNNs)
  • Gaussian mixture model (GMM)
  • recurrent temporal restricted Boltzmann machines (RTRBMs)
  • speech bandwidth extension (BWE)

Fingerprint

Dive into the research topics of 'Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines'. Together they form a unique fingerprint.

Cite this