Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection

Lu Li; Maoshen Jia; Jing Wang; Ruiyuan Cao

doi:10.1109/TASLP.2023.3321213

Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection

Lu Li, Maoshen Jia^*, Jing Wang, Ruiyuan Cao

^*此作品的通讯作者

信息与电子学院

Beijing University of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

This study proposes multiple-speech-source direction -of-arrival (DOA) estimation based on the distribution characteristic of the time-frequency (TF) point dominated by a single-source component (i.e., single-source point, SSP). By exploring the TF distribution characteristics of SSPs, we found that most are distributed in clusters in the TF domain. Hence, the concept of a single-source cluster (SSC) is given, each composed of adjacent TF points from one dominant sound source. Considering that SSCs have different shapes and sizes, an SSC detection method is designed based on point-to-cluster expansion, which is the research focus of this article. A two-dimensional Gaussian function is introduced to model the theoretical distribution of the DOAs of SSPs, and a cluster expansion rule is proposed based on hypothesis testing of the DOA of a source. Two-dimensional kernel density estimation and peak search are adopted to estimate the DOAs and the number of sources using the detected SSCs. Experimental results in both simulated and real environments show that the proposed method can achieve better DOA estimation performance than some current techniques.

源语言	英语
页（从-至）	3667-3680
页数	14
期刊	IEEE/ACM Transactions on Audio Speech and Language Processing
卷	31
DOI	https://doi.org/10.1109/TASLP.2023.3321213
出版状态	已出版 - 2023

访问文件

10.1109/TASLP.2023.3321213

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, L., Jia, M., Wang, J., & Cao, R. (2023). Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection. IEEE/ACM Transactions on Audio Speech and Language Processing, 31, 3667-3680. https://doi.org/10.1109/TASLP.2023.3321213

@article{bf21bb14cc204b7285d1f7b7513a998d,

title = "Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection",

abstract = "This study proposes multiple-speech-source direction -of-arrival (DOA) estimation based on the distribution characteristic of the time-frequency (TF) point dominated by a single-source component (i.e., single-source point, SSP). By exploring the TF distribution characteristics of SSPs, we found that most are distributed in clusters in the TF domain. Hence, the concept of a single-source cluster (SSC) is given, each composed of adjacent TF points from one dominant sound source. Considering that SSCs have different shapes and sizes, an SSC detection method is designed based on point-to-cluster expansion, which is the research focus of this article. A two-dimensional Gaussian function is introduced to model the theoretical distribution of the DOAs of SSPs, and a cluster expansion rule is proposed based on hypothesis testing of the DOA of a source. Two-dimensional kernel density estimation and peak search are adopted to estimate the DOAs and the number of sources using the detected SSCs. Experimental results in both simulated and real environments show that the proposed method can achieve better DOA estimation performance than some current techniques.",

keywords = "DOA estimation, hypothesis testing, single-source cluster detection",

author = "Lu Li and Maoshen Jia and Jing Wang and Ruiyuan Cao",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2023",

doi = "10.1109/TASLP.2023.3321213",

language = "English",

volume = "31",

pages = "3667--3680",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection

AU - Li, Lu

AU - Jia, Maoshen

AU - Wang, Jing

AU - Cao, Ruiyuan

PY - 2023

Y1 - 2023

N2 - This study proposes multiple-speech-source direction -of-arrival (DOA) estimation based on the distribution characteristic of the time-frequency (TF) point dominated by a single-source component (i.e., single-source point, SSP). By exploring the TF distribution characteristics of SSPs, we found that most are distributed in clusters in the TF domain. Hence, the concept of a single-source cluster (SSC) is given, each composed of adjacent TF points from one dominant sound source. Considering that SSCs have different shapes and sizes, an SSC detection method is designed based on point-to-cluster expansion, which is the research focus of this article. A two-dimensional Gaussian function is introduced to model the theoretical distribution of the DOAs of SSPs, and a cluster expansion rule is proposed based on hypothesis testing of the DOA of a source. Two-dimensional kernel density estimation and peak search are adopted to estimate the DOAs and the number of sources using the detected SSCs. Experimental results in both simulated and real environments show that the proposed method can achieve better DOA estimation performance than some current techniques.

AB - This study proposes multiple-speech-source direction -of-arrival (DOA) estimation based on the distribution characteristic of the time-frequency (TF) point dominated by a single-source component (i.e., single-source point, SSP). By exploring the TF distribution characteristics of SSPs, we found that most are distributed in clusters in the TF domain. Hence, the concept of a single-source cluster (SSC) is given, each composed of adjacent TF points from one dominant sound source. Considering that SSCs have different shapes and sizes, an SSC detection method is designed based on point-to-cluster expansion, which is the research focus of this article. A two-dimensional Gaussian function is introduced to model the theoretical distribution of the DOAs of SSPs, and a cluster expansion rule is proposed based on hypothesis testing of the DOA of a source. Two-dimensional kernel density estimation and peak search are adopted to estimate the DOAs and the number of sources using the detected SSCs. Experimental results in both simulated and real environments show that the proposed method can achieve better DOA estimation performance than some current techniques.

KW - DOA estimation

KW - hypothesis testing

KW - single-source cluster detection

UR - http://www.scopus.com/inward/record.url?scp=85174852801&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2023.3321213

DO - 10.1109/TASLP.2023.3321213

M3 - Article

AN - SCOPUS:85174852801

SN - 2329-9290

VL - 31

SP - 3667

EP - 3680

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

ER -

Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection

摘要

访问文件

其它文件与链接

指纹

引用此