TY - GEN
T1 - Using conditional restricted Boltzmann machines for spectral envelope modeling in speech bandwidth extension
AU - Wang, Yingxue
AU - Zhao, Shenghui
AU - Qu, Dan
AU - Kuang, Jingming
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - In this paper, we present a conditional restricted Boltzmann machine (CRBM) based speech bandwidth extension (BWE) method. A CRBM is employed to obtain time information and model deep non-linear relationships between the spectral envelope features of low frequency (LF) and high frequency (HF). Two exclusive CRBMs are adopted to model the distribution of LF's and HF's spectral envelope features. respectively. A neural network (NN) is then used to model the joint distribution of hidden variables extracted from the two CRBMs. The proposed method takes advantage of the strong ability of CRBM in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional Gaussian mixture model based methods and other NN based methods.
AB - In this paper, we present a conditional restricted Boltzmann machine (CRBM) based speech bandwidth extension (BWE) method. A CRBM is employed to obtain time information and model deep non-linear relationships between the spectral envelope features of low frequency (LF) and high frequency (HF). Two exclusive CRBMs are adopted to model the distribution of LF's and HF's spectral envelope features. respectively. A neural network (NN) is then used to model the joint distribution of hidden variables extracted from the two CRBMs. The proposed method takes advantage of the strong ability of CRBM in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional Gaussian mixture model based methods and other NN based methods.
KW - Gaussian mixture model
KW - Speech bandwidth expansion
KW - conditional restricted Boltzmann machine
UR - http://www.scopus.com/inward/record.url?scp=84973320896&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7472815
DO - 10.1109/ICASSP.2016.7472815
M3 - Conference contribution
AN - SCOPUS:84973320896
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5930
EP - 5934
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -