Recurrent neural network for spectral mapping in speech bandwidth extension

Yingxue Wang, Shenghui Zhao, Jianxin Li, Jingming Kuang, Qiang Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.

Original languageEnglish
Title of host publication2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages242-246
Number of pages5
ISBN (Electronic)9781509045457
DOIs
Publication statusPublished - 19 Apr 2017
Event2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Washington, United States
Duration: 7 Dec 20169 Dec 2016

Publication series

Name2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
Volume2017-April

Conference

Conference2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016
Country/TerritoryUnited States
CityWashington
Period7/12/169/12/16

Keywords

  • Feedforward neural network
  • Gaussian mixture model
  • Recurrent neural network
  • Recurrent temporal restricted Boltzmann machine
  • Speech bandwidth extension

Fingerprint

Dive into the research topics of 'Recurrent neural network for spectral mapping in speech bandwidth extension'. Together they form a unique fingerprint.

Cite this