摘要
For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.
源语言 | 英语 |
---|---|
出版状态 | 已出版 - 2020 |
已对外发布 | 是 |
活动 | 148th Audio Engineering Society International Convention 2020 - Vienna, Virtual, Online, 奥地利 期限: 2 6月 2020 → 5 6月 2020 |
会议
会议 | 148th Audio Engineering Society International Convention 2020 |
---|---|
国家/地区 | 奥地利 |
市 | Vienna, Virtual, Online |
时期 | 2/06/20 → 5/06/20 |