TY - JOUR
T1 - Deep Kronecker Product Beamforming for Large-Scale Microphone Arrays
AU - Meng, Weixin
AU - Li, Xiaoyu
AU - Li, Andong
AU - Luo, Xiaoxue
AU - Yan, Shefeng
AU - Li, Xiaodong
AU - Zheng, Chengshi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Although deep learning based beamformers have achieved promising performance using small microphone arrays, they suffer from performance degradation in very challenging environments, such as extremely low Signal-to-Noise Ratio (SNR) environments, e.g., SNR $\le$-10 dB. A large-scale microphone array with dozens or hundreds of microphones can improve the performance of beamformers in these challenging scenarios because of its high spatial resolution. While a dramatic increase in the number of microphones leads to feature redundancy, causing difficulties in feature extraction and network training. As an attempt to improve the performance of deep beamformers for speech extraction in very challenging scenarios, this paper proposes a novel all neural Kronecker product beamforming denoted by ANKP-BF for large-scale microphone arrays by taking the following two aspects into account. Firstly, a larger microphone array can provide higher performance of spatial filtering when compared with a small microphone array, and deep neural networks are introduced for their powerful non-linear modeling capability in the speech extraction task. Secondly, the feature redundancy problem is solved by introducing the Kronecker product rule to decompose the original one high-dimension weight vector into the Kronecker product of two much lower-dimensional weight vectors. The proposed ANKP-BF is designed to operate in an end-to-end manner. Extensive experiments are conducted on simulated large-scale microphone-array signals using the DNS-Challenge corpus and WSJ0-SI84 corpus, and the real recordings in a semi-anechoic room and outdoor scenes are also used to evaluate and compare the performance of different methods. Quantitative results demonstrate that the proposed method outperforms existing advanced baselines in terms of multiple objective metrics, especially in very low SNR environments.
AB - Although deep learning based beamformers have achieved promising performance using small microphone arrays, they suffer from performance degradation in very challenging environments, such as extremely low Signal-to-Noise Ratio (SNR) environments, e.g., SNR $\le$-10 dB. A large-scale microphone array with dozens or hundreds of microphones can improve the performance of beamformers in these challenging scenarios because of its high spatial resolution. While a dramatic increase in the number of microphones leads to feature redundancy, causing difficulties in feature extraction and network training. As an attempt to improve the performance of deep beamformers for speech extraction in very challenging scenarios, this paper proposes a novel all neural Kronecker product beamforming denoted by ANKP-BF for large-scale microphone arrays by taking the following two aspects into account. Firstly, a larger microphone array can provide higher performance of spatial filtering when compared with a small microphone array, and deep neural networks are introduced for their powerful non-linear modeling capability in the speech extraction task. Secondly, the feature redundancy problem is solved by introducing the Kronecker product rule to decompose the original one high-dimension weight vector into the Kronecker product of two much lower-dimensional weight vectors. The proposed ANKP-BF is designed to operate in an end-to-end manner. Extensive experiments are conducted on simulated large-scale microphone-array signals using the DNS-Challenge corpus and WSJ0-SI84 corpus, and the real recordings in a semi-anechoic room and outdoor scenes are also used to evaluate and compare the performance of different methods. Quantitative results demonstrate that the proposed method outperforms existing advanced baselines in terms of multiple objective metrics, especially in very low SNR environments.
KW - Kronecker product
KW - large-scale microphone array
KW - neural network
KW - speech extraction
UR - http://www.scopus.com/inward/record.url?scp=85204195830&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2024.3459430
DO - 10.1109/TASLP.2024.3459430
M3 - Article
AN - SCOPUS:85204195830
SN - 2329-9290
VL - 32
SP - 4537
EP - 4553
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -