Multi-task based sound localization model

Song Tao, Qu Tianshu, Chen Jing*

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)

Abstract

For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.

Original languageEnglish
Publication statusPublished - 2020
Externally publishedYes
Event148th Audio Engineering Society International Convention 2020 - Vienna, Virtual, Online, Austria
Duration: 2 Jun 20205 Jun 2020

Conference

Conference148th Audio Engineering Society International Convention 2020
Country/TerritoryAustria
CityVienna, Virtual, Online
Period2/06/205/06/20

Fingerprint

Dive into the research topics of 'Multi-task based sound localization model'. Together they form a unique fingerprint.

Cite this