Recurrent neural network for spectral mapping in speech bandwidth extension

Yingxue Wang; Shenghui Zhao; Jianxin Li; Jingming Kuang; Qiang Zhu

doi:10.1109/GlobalSIP.2016.7905840

Recurrent neural network for spectral mapping in speech bandwidth extension

Yingxue Wang, Shenghui Zhao, Jianxin Li, Jingming Kuang, Qiang Zhu

School of Information and Electronics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.

Original language	English
Title of host publication	2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	242-246
Number of pages	5
ISBN (Electronic)	9781509045457
DOIs	https://doi.org/10.1109/GlobalSIP.2016.7905840
Publication status	Published - 19 Apr 2017
Event	2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Washington, United States Duration: 7 Dec 2016 → 9 Dec 2016

Publication series

Name	2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
Volume	2017-April

Conference

Conference	2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016
Country/Territory	United States
City	Washington
Period	7/12/16 → 9/12/16

Keywords

Feedforward neural network
Gaussian mixture model
Recurrent neural network
Recurrent temporal restricted Boltzmann machine
Speech bandwidth extension

Access to Document

10.1109/GlobalSIP.2016.7905840

Cite this

Wang, Y., Zhao, S., Li, J., Kuang, J., & Zhu, Q. (2017). Recurrent neural network for spectral mapping in speech bandwidth extension. In 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings (pp. 242-246). (2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings; Vol. 2017-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/GlobalSIP.2016.7905840

Wang, Yingxue ; Zhao, Shenghui ; Li, Jianxin et al. / Recurrent neural network for spectral mapping in speech bandwidth extension. 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 242-246 (2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings).

@inproceedings{775a2af0d8664fb9a8edaec4fbe1cfc6,

title = "Recurrent neural network for spectral mapping in speech bandwidth extension",

abstract = "We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.",

keywords = "Feedforward neural network, Gaussian mixture model, Recurrent neural network, Recurrent temporal restricted Boltzmann machine, Speech bandwidth extension",

author = "Yingxue Wang and Shenghui Zhao and Jianxin Li and Jingming Kuang and Qiang Zhu",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 ; Conference date: 07-12-2016 Through 09-12-2016",

year = "2017",

month = apr,

day = "19",

doi = "10.1109/GlobalSIP.2016.7905840",

language = "English",

series = "2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "242--246",

booktitle = "2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings",

address = "United States",

}

Wang, Y, Zhao, S, Li, J, Kuang, J & Zhu, Q 2017, Recurrent neural network for spectral mapping in speech bandwidth extension. in 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings, vol. 2017-April, Institute of Electrical and Electronics Engineers Inc., pp. 242-246, 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016, Washington, United States, 7/12/16. https://doi.org/10.1109/GlobalSIP.2016.7905840

Recurrent neural network for spectral mapping in speech bandwidth extension. / Wang, Yingxue; Zhao, Shenghui; Li, Jianxin et al.
2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 242-246 (2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings; Vol. 2017-April).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Recurrent neural network for spectral mapping in speech bandwidth extension

AU - Wang, Yingxue

AU - Zhao, Shenghui

AU - Li, Jianxin

AU - Kuang, Jingming

AU - Zhu, Qiang

PY - 2017/4/19

Y1 - 2017/4/19

N2 - We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.

AB - We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.

KW - Feedforward neural network

KW - Gaussian mixture model

KW - Recurrent neural network

KW - Recurrent temporal restricted Boltzmann machine

KW - Speech bandwidth extension

UR - http://www.scopus.com/inward/record.url?scp=85043695997&partnerID=8YFLogxK

U2 - 10.1109/GlobalSIP.2016.7905840

DO - 10.1109/GlobalSIP.2016.7905840

M3 - Conference contribution

AN - SCOPUS:85043695997

T3 - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings

SP - 242

EP - 246

BT - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016

Y2 - 7 December 2016 through 9 December 2016

ER -

Wang Y, Zhao S, Li J, Kuang J, Zhu Q. Recurrent neural network for spectral mapping in speech bandwidth extension. In 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 242-246. (2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings). doi: 10.1109/GlobalSIP.2016.7905840

Recurrent neural network for spectral mapping in speech bandwidth extension

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this