Abstract
For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.
Original language | English |
---|---|
Publication status | Published - 2020 |
Externally published | Yes |
Event | 148th Audio Engineering Society International Convention 2020 - Vienna, Virtual, Online, Austria Duration: 2 Jun 2020 → 5 Jun 2020 |
Conference
Conference | 148th Audio Engineering Society International Convention 2020 |
---|---|
Country/Territory | Austria |
City | Vienna, Virtual, Online |
Period | 2/06/20 → 5/06/20 |