Speech bandwidth expansion based on deep neural networks

Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, Jingming Kuang

Research output: Contribution to journalConference articlepeer-review

38 Citations (Scopus)

Abstract

This paper proposes a new speech bandwidth expansion method, which uses Deep Neural Networks (DNNs) to build high-order eigenspaces between the low frequency components and the high frequency components of the speech signal. A four-layer DNN is trained layer-by-layer from a cascade of Neural Networks (NNs) and two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs). The GBRBMs are adopted to model the distribution of spectral envelopes of the low frequency and the high frequency respectively. The NNs are used to model the joint distribution of hidden variables extracted from the two GBRBMs. The proposed method takes advantage of the strong modeling ability of GBRBMs in modeling the distribution of the spectral envelopes. And both the objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

Original languageEnglish
Pages (from-to)2593-2597
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
Publication statusPublished - 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: 6 Sept 201510 Sept 2015

Keywords

  • Bandwidth extension
  • Deep neural networks
  • Gaussian-Bernoulli Restricted Boltzmann Machine
  • Neural networks

Fingerprint

Dive into the research topics of 'Speech bandwidth expansion based on deep neural networks'. Together they form a unique fingerprint.

Cite this