TY - JOUR
T1 - Automatic Bird Sound Source Separation Based On Passive Acoustic Devices in Wild Environment
AU - Xie, Jiangjian
AU - Shi, Yuwei
AU - Ni, Dongming
AU - Milling, Manuel
AU - Liu, Shuo
AU - Zhang, Junguo
AU - Qian, Kun
AU - Schuller, Bjorn W.
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - The Internet of Things (IoT)-based passive acoustic monitoring (PAM) has shown great potential in large-scale remote bird monitoring. However, field recordings often contain overlapping signals, making precise bird information extraction challenging. To solve this challenge, first, the inter-channel spatial feature is chosen as complementary information to the spectral feature to obtain additional spatial correlations between the sources. Then, an end-to-end model named BACPPNet is built based on Deeplabv3plus and enhanced with the polarized self-attention mechanism to estimate the spectral amplitude mask (SMM) for separating bird vocalizations. Finally, the separated bird vocalizations are recovered from SMMs and the spectrogram of mixed audio using the inverse short Fourier transform (ISTFT). We evaluate our proposed method utilizing the generated mixed dataset. Experiments have shown that our method can separate bird vocalizations from mixed audio with RMSE, SDR, SIR, SAR, and STOI values of 2.82, 10.00dB, 29.90 dB, 11.08 dB, and 0.66, respectively, which are better than existing methods. Furthermore, the average classification accuracy of the separated bird vocalizations drops the least. This indicates that our method outperforms other compared separation methods in bird sound separation and preserves the fidelity of the separated sound sources, which might help us better understand wild bird sound recordings.
AB - The Internet of Things (IoT)-based passive acoustic monitoring (PAM) has shown great potential in large-scale remote bird monitoring. However, field recordings often contain overlapping signals, making precise bird information extraction challenging. To solve this challenge, first, the inter-channel spatial feature is chosen as complementary information to the spectral feature to obtain additional spatial correlations between the sources. Then, an end-to-end model named BACPPNet is built based on Deeplabv3plus and enhanced with the polarized self-attention mechanism to estimate the spectral amplitude mask (SMM) for separating bird vocalizations. Finally, the separated bird vocalizations are recovered from SMMs and the spectrogram of mixed audio using the inverse short Fourier transform (ISTFT). We evaluate our proposed method utilizing the generated mixed dataset. Experiments have shown that our method can separate bird vocalizations from mixed audio with RMSE, SDR, SIR, SAR, and STOI values of 2.82, 10.00dB, 29.90 dB, 11.08 dB, and 0.66, respectively, which are better than existing methods. Furthermore, the average classification accuracy of the separated bird vocalizations drops the least. This indicates that our method outperforms other compared separation methods in bird sound separation and preserves the fidelity of the separated sound sources, which might help us better understand wild bird sound recordings.
KW - Acoustics
KW - Bird sound separation
KW - Birds
KW - Forestry
KW - Monitoring
KW - Recording
KW - Source separation
KW - Task analysis
KW - multi-channel audio processing
KW - polarized self-attention mechanism
UR - http://www.scopus.com/inward/record.url?scp=85182946866&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2024.3354036
DO - 10.1109/JIOT.2024.3354036
M3 - Article
AN - SCOPUS:85182946866
SN - 2327-4662
SP - 1
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
ER -