Speech bandwidth extension based on restricted boltzmann machines

Yingxue Wang, Shenghui Zhao*, Yingying Yu, Jingming Kuang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Speech Bandwidth Extension (BWE) is a technique that attempts to improve the speech quality by recovering the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF parts of the wide-band speech signal. The Gaussian Mixture Model (GMM) based methods are widely used, but it recovers the missing HF components on the assumption that the LF and HF parts obey a Gaussian distribution and gives their linear relationship, leading to the distortion of reconstructed speech. This Study proposes a new speech BWE method, which uses two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs) to extract the high-order statistical characteristics of spectral envelopes of the LF and HF respectively. Then, high-order features of the LF are mapped to those of the HF using a Feedforward Neural Network (FNN). The proposed method learns deep relationship between the spectral envelopes of LF and HF and can model the distribution of spectral envelopes more precisely by extracting the high-order statistical characteristics of the LF components and the HF components. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

Original languageEnglish
Pages (from-to)1717-1723
Number of pages7
JournalDianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology
Volume38
Issue number7
DOIs
Publication statusPublished - 1 Jul 2016

Keywords

  • Feedforward Neural Networks (FNN)
  • Gaussian mixture model
  • Restricted Boltzmann machines
  • Speech bandwidth extension

Fingerprint

Dive into the research topics of 'Speech bandwidth extension based on restricted boltzmann machines'. Together they form a unique fingerprint.

Cite this