Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments

Ruwei Li*, Tao Li, Xiaoyue Sun, Xingwu Sun, Fengnian Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Background noise and room reverberation often cause a decrease in reliability of binaural cues and speech quality, especially in non-stationary environment. In order to solve these problems, we propose a novel speech separation algorithm based on two-stage neural network model and a special separation mask in noisy-reverberant environment. In this algorithm, firstly, the weight matrix is derived to construct reliable binaural cues through the first-stage neural network. The reliable binaural cues combined with complementary spectral features is used as input of separation DNN. Secondly, a special separation mask is introduced for noisy-reverberant environment, which can suppress background noise and reduce reverberation. Thirdly, the separation DNN is used as nonlinear function to estimate separation mask. Then, the two-stage neural network system is trained jointly. During the joint training process, the system adaptively adjusts the weight matrix according to the final error, which is similar to the attention mechanism introduced for binaural features. At the same time, due to the increased reliability of binaural cues, neural networks can make better use of effective information. Finally, the estimated separation mask is used to weight the noisy-reverberant speech to achieve the enhanced speech. Experimental results indicate that the proposed algorithm achieves better performance than the contrast algorithms in different scenarios with various amounts of noise and reverberation.

Original languageEnglish
Article number107445
JournalApplied Acoustics
Volume168
DOIs
Publication statusPublished - Nov 2020
Externally publishedYes

Keywords

  • Binaural cues
  • Binaural speech separation
  • Deep Neural Network
  • Two-stage model

Fingerprint

Dive into the research topics of 'Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments'. Together they form a unique fingerprint.

Cite this