JointNet: Joint Learning for Simultaneous DOA Estimation and Speech Enhancement in Noisy and Reverberant Environments

  • Wenmeng Xiong
  • , Maoshen Jia*
  • , Jing Zhou
  • , Jing Zhang
  • , Qing Shen
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we design a joint learning network to simultaneously address the tasks of direction of arrival (DOA) estimation and speech enhancement. The proposed network consists of two DOA estimation blocks, a speech enhancement block, and two interaction blocks. Specifically, cross-narrowband modules are employed in both the DOA estimation block and the speech enhancement block in order to learn both the frequency dependencies and temporal correlations of time-frequency (TF) domain microphone signals. Bidirectional interaction blocks are designed to fully exploit the synergy between these two tasks by integrating DOA information of the sources into the speech enhancement block and integrating the enhanced high-quality signals from the speech enhancement block back into the DOA estimation blocks. In this way, the performance of both tasks can be improved compared with independent training. Experiments were conducted on two datasets: the first one is generated by convolving the simulated room impulse responses (RIRs) with clean speeches from LibriSpeech dataset, while in the second one the clean speeches from DNS Challenge dataset are convolved with both simulated RIRs and real-world recorded RIRs. The experimental results demonstrate that our proposed joint learning method can significantly improve the performance of both DOA estimation and speech enhancement tasks compared to baseline methods.

Original languageEnglish
Pages (from-to)596-611
Number of pages16
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume34
DOIs
Publication statusPublished - 2026
Externally publishedYes

Keywords

  • Direction of arrival estimation
  • convolutional neural network
  • joint learning
  • long short term memory
  • speech enhancement

Fingerprint

Dive into the research topics of 'JointNet: Joint Learning for Simultaneous DOA Estimation and Speech Enhancement in Noisy and Reverberant Environments'. Together they form a unique fingerprint.

Cite this