Speech bandwidth extension supported by temporal information

Yingxue Wang; Shenghui Zhao; Jingming Kuang

Speech bandwidth extension supported by temporal information

Yingxue Wang, Shenghui Zhao^*, Jingming Kuang

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Speech Bandwidth Extension (BWE) aims to improve the quality of speech by reconstructing the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF of speech. The Gaussian Mixture Model (GMM) based methods are widely used. However, the derived mapping function by GMM is a piece-wise linear transformation and ignores the temporal information of speech. Thus, a novel BWE method is proposed for estimation of the HF parts of speech by exploiting Conditional Restricted Boltzmann Machines (CRBM). The proposed method introduces CRBM to obtain time information and model deep non-linear relationships between the spectral envelope features of LF and HF by building high-order eigen spaces between the LF and HF of the speech signal. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

源语言	英语
页（从-至）	370-376
页数	7
期刊	Shengxue Xuebao/Acta Acustica
卷	42
期	3
出版状态	已出版 - 1 5月 2017

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, Y., Zhao, S., & Kuang, J. (2017). Speech bandwidth extension supported by temporal information. Shengxue Xuebao/Acta Acustica, 42(3), 370-376.

@article{f99c79a7eb9c475bb9fbd28d52407344,

title = "Speech bandwidth extension supported by temporal information",

abstract = "Speech Bandwidth Extension (BWE) aims to improve the quality of speech by reconstructing the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF of speech. The Gaussian Mixture Model (GMM) based methods are widely used. However, the derived mapping function by GMM is a piece-wise linear transformation and ignores the temporal information of speech. Thus, a novel BWE method is proposed for estimation of the HF parts of speech by exploiting Conditional Restricted Boltzmann Machines (CRBM). The proposed method introduces CRBM to obtain time information and model deep non-linear relationships between the spectral envelope features of LF and HF by building high-order eigen spaces between the LF and HF of the speech signal. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.",

author = "Yingxue Wang and Shenghui Zhao and Jingming Kuang",

note = "Publisher Copyright: {\textcopyright} 2017 Acta Acustica.",

year = "2017",

month = may,

day = "1",

language = "English",

volume = "42",

pages = "370--376",

journal = "Shengxue Xuebao/Acta Acustica",

issn = "0371-0025",

publisher = "Science Press",

number = "3",

}

TY - JOUR

T1 - Speech bandwidth extension supported by temporal information

AU - Wang, Yingxue

AU - Zhao, Shenghui

AU - Kuang, Jingming

PY - 2017/5/1

Y1 - 2017/5/1

N2 - Speech Bandwidth Extension (BWE) aims to improve the quality of speech by reconstructing the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF of speech. The Gaussian Mixture Model (GMM) based methods are widely used. However, the derived mapping function by GMM is a piece-wise linear transformation and ignores the temporal information of speech. Thus, a novel BWE method is proposed for estimation of the HF parts of speech by exploiting Conditional Restricted Boltzmann Machines (CRBM). The proposed method introduces CRBM to obtain time information and model deep non-linear relationships between the spectral envelope features of LF and HF by building high-order eigen spaces between the LF and HF of the speech signal. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

AB - Speech Bandwidth Extension (BWE) aims to improve the quality of speech by reconstructing the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF of speech. The Gaussian Mixture Model (GMM) based methods are widely used. However, the derived mapping function by GMM is a piece-wise linear transformation and ignores the temporal information of speech. Thus, a novel BWE method is proposed for estimation of the HF parts of speech by exploiting Conditional Restricted Boltzmann Machines (CRBM). The proposed method introduces CRBM to obtain time information and model deep non-linear relationships between the spectral envelope features of LF and HF by building high-order eigen spaces between the LF and HF of the speech signal. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.

UR - http://www.scopus.com/inward/record.url?scp=85020138250&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85020138250

SN - 0371-0025

VL - 42

SP - 370

EP - 376

JO - Shengxue Xuebao/Acta Acustica

JF - Shengxue Xuebao/Acta Acustica

IS - 3

ER -

Speech bandwidth extension supported by temporal information

摘要

其它文件与链接

指纹

引用此