Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Yingxue Wang; Shenghui Zhao; Jianxin Li; Jingming Kuang

doi:10.1109/LSP.2016.2621053

Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Yingxue Wang, Shenghui Zhao^*, Jianxin Li, Jingming Kuang

^*Corresponding author for this work

School of Information and Electronics

Research output: Contribution to journal › Article › peer-review

14 Citations (Scopus)

Abstract

In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

Original language	English
Article number	7676312
Pages (from-to)	1877-1881
Number of pages	5
Journal	IEEE Signal Processing Letters
Volume	23
Issue number	12
DOIs	https://doi.org/10.1109/LSP.2016.2621053
Publication status	Published - Dec 2016

Keywords

Deep neural networks (DNNs)
Gaussian mixture model (GMM)
recurrent temporal restricted Boltzmann machines (RTRBMs)
speech bandwidth extension (BWE)

Access to Document

10.1109/LSP.2016.2621053

Cite this

@article{7883e1862dff45c78809fd0591375922,

title = "Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines",

abstract = "In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.",

keywords = "Deep neural networks (DNNs), Gaussian mixture model (GMM), recurrent temporal restricted Boltzmann machines (RTRBMs), speech bandwidth extension (BWE)",

author = "Yingxue Wang and Shenghui Zhao and Jianxin Li and Jingming Kuang",

note = "Publisher Copyright: {\textcopyright} 1994-2012 IEEE.",

year = "2016",

month = dec,

doi = "10.1109/LSP.2016.2621053",

language = "English",

volume = "23",

pages = "1877--1881",

journal = "IEEE Signal Processing Letters",

issn = "1070-9908",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

AU - Wang, Yingxue

AU - Zhao, Shenghui

AU - Li, Jianxin

AU - Kuang, Jingming

PY - 2016/12

Y1 - 2016/12

N2 - In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

AB - In this paper, we present a new speech bandwidth extension method (BWE) using recurrent temporal restricted Boltzmann machine (RTRBM). The conventional Gaussian mixture model (GMM)-based and deep neural networks (DNNs)-based BWE methods perform stably and effectively. However, the mapping function of GMM-based methods is a piecewise linear transformation, which is insufficient to model the complex nonlinear mapping relationship between the spectral envelope features of low frequency (LF) and high frequency (HF). In the conventional DNNs methods, temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To counter these issues, a multilayer DNN which is composed of two RTRBMs and a feedforward neural network (NN) is employed to obtain time information and model deep nonlinear relationships between the spectral envelope features of LF and HF. The proposed method takes advantage of the strong ability of RTRBM in discovering the temporal correlation in the high-order space and modeling deep nonlinear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM-based methods and other NN-based methods.

KW - Deep neural networks (DNNs)

KW - Gaussian mixture model (GMM)

KW - recurrent temporal restricted Boltzmann machines (RTRBMs)

KW - speech bandwidth extension (BWE)

UR - http://www.scopus.com/inward/record.url?scp=85006059009&partnerID=8YFLogxK

U2 - 10.1109/LSP.2016.2621053

DO - 10.1109/LSP.2016.2621053

M3 - Article

AN - SCOPUS:85006059009

SN - 1070-9908

VL - 23

SP - 1877

EP - 1881

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

IS - 12

M1 - 7676312

ER -

Speech Bandwidth Extension Using Recurrent Temporal Restricted Boltzmann Machines

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this