TY - JOUR
T1 - Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments
AU - Li, Ruwei
AU - Li, Tao
AU - Sun, Xiaoyue
AU - Sun, Xingwu
AU - Zhao, Fengnian
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/11
Y1 - 2020/11
N2 - Background noise and room reverberation often cause a decrease in reliability of binaural cues and speech quality, especially in non-stationary environment. In order to solve these problems, we propose a novel speech separation algorithm based on two-stage neural network model and a special separation mask in noisy-reverberant environment. In this algorithm, firstly, the weight matrix is derived to construct reliable binaural cues through the first-stage neural network. The reliable binaural cues combined with complementary spectral features is used as input of separation DNN. Secondly, a special separation mask is introduced for noisy-reverberant environment, which can suppress background noise and reduce reverberation. Thirdly, the separation DNN is used as nonlinear function to estimate separation mask. Then, the two-stage neural network system is trained jointly. During the joint training process, the system adaptively adjusts the weight matrix according to the final error, which is similar to the attention mechanism introduced for binaural features. At the same time, due to the increased reliability of binaural cues, neural networks can make better use of effective information. Finally, the estimated separation mask is used to weight the noisy-reverberant speech to achieve the enhanced speech. Experimental results indicate that the proposed algorithm achieves better performance than the contrast algorithms in different scenarios with various amounts of noise and reverberation.
AB - Background noise and room reverberation often cause a decrease in reliability of binaural cues and speech quality, especially in non-stationary environment. In order to solve these problems, we propose a novel speech separation algorithm based on two-stage neural network model and a special separation mask in noisy-reverberant environment. In this algorithm, firstly, the weight matrix is derived to construct reliable binaural cues through the first-stage neural network. The reliable binaural cues combined with complementary spectral features is used as input of separation DNN. Secondly, a special separation mask is introduced for noisy-reverberant environment, which can suppress background noise and reduce reverberation. Thirdly, the separation DNN is used as nonlinear function to estimate separation mask. Then, the two-stage neural network system is trained jointly. During the joint training process, the system adaptively adjusts the weight matrix according to the final error, which is similar to the attention mechanism introduced for binaural features. At the same time, due to the increased reliability of binaural cues, neural networks can make better use of effective information. Finally, the estimated separation mask is used to weight the noisy-reverberant speech to achieve the enhanced speech. Experimental results indicate that the proposed algorithm achieves better performance than the contrast algorithms in different scenarios with various amounts of noise and reverberation.
KW - Binaural cues
KW - Binaural speech separation
KW - Deep Neural Network
KW - Two-stage model
UR - http://www.scopus.com/inward/record.url?scp=85086138108&partnerID=8YFLogxK
U2 - 10.1016/j.apacoust.2020.107445
DO - 10.1016/j.apacoust.2020.107445
M3 - Article
AN - SCOPUS:85086138108
SN - 0003-682X
VL - 168
JO - Applied Acoustics
JF - Applied Acoustics
M1 - 107445
ER -