Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection

Lu Li, Maoshen Jia*, Jing Wang, Ruiyuan Cao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

This study proposes multiple-speech-source direction -of-arrival (DOA) estimation based on the distribution characteristic of the time-frequency (TF) point dominated by a single-source component (i.e., single-source point, SSP). By exploring the TF distribution characteristics of SSPs, we found that most are distributed in clusters in the TF domain. Hence, the concept of a single-source cluster (SSC) is given, each composed of adjacent TF points from one dominant sound source. Considering that SSCs have different shapes and sizes, an SSC detection method is designed based on point-to-cluster expansion, which is the research focus of this article. A two-dimensional Gaussian function is introduced to model the theoretical distribution of the DOAs of SSPs, and a cluster expansion rule is proposed based on hypothesis testing of the DOA of a source. Two-dimensional kernel density estimation and peak search are adopted to estimate the DOAs and the number of sources using the detected SSCs. Experimental results in both simulated and real environments show that the proposed method can achieve better DOA estimation performance than some current techniques.

Original languageEnglish
Pages (from-to)3667-3680
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume31
DOIs
Publication statusPublished - 2023

Keywords

  • DOA estimation
  • hypothesis testing
  • single-source cluster detection

Fingerprint

Dive into the research topics of 'Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection'. Together they form a unique fingerprint.

Cite this