Multi-task based sound localization model

Song Tao; Qu Tianshu; Chen Jing

Multi-task based sound localization model

Song Tao, Qu Tianshu, Chen Jing^*

^*此作品的通讯作者

Peking University

科研成果: 会议稿件 › 论文 › 同行评审

1 引用（Scopus）

摘要

For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.

源语言	英语
出版状态	已出版 - 2020
已对外发布	是
活动	148th Audio Engineering Society International Convention 2020 - Vienna, Virtual, Online, 奥地利期限: 2 6月 2020 → 5 6月 2020

会议

会议	148th Audio Engineering Society International Convention 2020
国家/地区	奥地利
市	Vienna, Virtual, Online
时期	2/06/20 → 5/06/20

其它文件与链接

链接到 Scopus 的出版物

引用此

@conference{15bd493905b14381a6686649215f2a43,

title = "Multi-task based sound localization model",

abstract = "For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.",

author = "Song Tao and Qu Tianshu and Chen Jing",

note = "Publisher Copyright: {\textcopyright} 2020 148th Audio Engineering Society International Convention. All rights reserved.; 148th Audio Engineering Society International Convention 2020 ; Conference date: 02-06-2020 Through 05-06-2020",

year = "2020",

language = "English",

}

TY - CONF

T1 - Multi-task based sound localization model

AU - Tao, Song

AU - Tianshu, Qu

AU - Jing, Chen

PY - 2020

Y1 - 2020

N2 - For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.

AB - For machine hearing in complex scenes (i.e. reverberation, multi-sound sources), sound localization either serves as the front-end or is implicitly encoded in speech enhancing models. However, extracting binaural cues for sound localization is dependent on the clarity of the input speech signals, and speech enhancing (i.e. dereverberation or denoise) can benefit the processing of sound localization. Based on the idea above, a multi-task based sound localization model is proposed in this study. The proposed model takes waveform as input and simultaneously estimates the azimuth of the sound source and time-frequency (T-F) mask. Localization experiments were performed using binaural simulation in reverberant environments, and results show that compared to the single-task sound localization method, the presence of the speech enhancement task can improve the localization performance.

UR - http://www.scopus.com/inward/record.url?scp=85091601797&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85091601797

T2 - 148th Audio Engineering Society International Convention 2020

Y2 - 2 June 2020 through 5 June 2020

ER -

Multi-task based sound localization model

摘要

会议

其它文件与链接

指纹

引用此