PGSS: Pitch-Guided Speech Separation

Xiang Li, Yiwen Wang, Yifan Sun, Xihong Wu, Jing Chen

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.

源语言英语
主期刊名AAAI-23 Technical Tracks 11
编辑Brian Williams, Yiling Chen, Jennifer Neville
出版商AAAI press
13130-13138
页数9
ISBN(电子版)9781577358800
出版状态已出版 - 27 6月 2023
已对外发布
活动37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, 美国
期限: 7 2月 202314 2月 2023

出版系列

姓名Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
37

会议

会议37th AAAI Conference on Artificial Intelligence, AAAI 2023
国家/地区美国
Washington
时期7/02/2314/02/23

指纹

探究 'PGSS: Pitch-Guided Speech Separation' 的科研主题。它们共同构成独一无二的指纹。

引用此