TY - JOUR
T1 - 基于时频感知神经网络的语音频带扩展
AU - Xu, Chundong
AU - Ling, Xianpeng
AU - Ying, Dongwen
AU - Wang, Jing
N1 - Publisher Copyright:
© 2021 Editorial Board of Journal of Signal Processing. All rights reserved.
PY - 2021/10
Y1 - 2021/10
N2 - In order to further improve the performance of speech bandwidth extension based on the deep learning, this paper presents a codec for the neural network structure. The encoder extracts the deep feature of data, the decoder reconstructs wideband speech, and in the middle of the codec, there is a locality sensitive hashing self-attention layer, which is used to enhance the model effective choice of depth characteristics. Temporal convolutional networks are used in the codec, which effectively improves the learning ability of the model to the context dependency of speech time series data. In order to train the model in a more accurate direction, a time-frequency perception loss function is proposed, which is beneficial for the model to obtain the optimal mapping solution from narrowband speech to wideband speech in time domain, frequency domain and perception domain. The subjective and objective experimental results show that the proposed method in this paper is superior to the traditional methods and the deep neural network methods for speech bandwidth extension in recent years.
AB - In order to further improve the performance of speech bandwidth extension based on the deep learning, this paper presents a codec for the neural network structure. The encoder extracts the deep feature of data, the decoder reconstructs wideband speech, and in the middle of the codec, there is a locality sensitive hashing self-attention layer, which is used to enhance the model effective choice of depth characteristics. Temporal convolutional networks are used in the codec, which effectively improves the learning ability of the model to the context dependency of speech time series data. In order to train the model in a more accurate direction, a time-frequency perception loss function is proposed, which is beneficial for the model to obtain the optimal mapping solution from narrowband speech to wideband speech in time domain, frequency domain and perception domain. The subjective and objective experimental results show that the proposed method in this paper is superior to the traditional methods and the deep neural network methods for speech bandwidth extension in recent years.
KW - local sensitive hash attention mechanism
KW - speech bandwidth extension
KW - temporal convolutional networks
KW - time-frequency perception loss
UR - http://www.scopus.com/inward/record.url?scp=85204064637&partnerID=8YFLogxK
U2 - 10.16798/j.issn.1003-0530.2021.10.025
DO - 10.16798/j.issn.1003-0530.2021.10.025
M3 - 文章
AN - SCOPUS:85204064637
SN - 1003-0530
VL - 37
SP - 2004
EP - 2012
JO - Journal of Signal Processing
JF - Journal of Signal Processing
IS - 10
ER -