PGSS: Pitch-Guided Speech Separation

Xiang Li; Yiwen Wang; Yifan Sun; Xihong Wu; Jing Chen

PGSS: Pitch-Guided Speech Separation

Xiang Li, Yiwen Wang, Yifan Sun, Xihong Wu, Jing Chen

Peking University

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.

源语言	英语
主期刊名	AAAI-23 Technical Tracks 11
编辑	Brian Williams, Yiling Chen, Jennifer Neville
出版商	AAAI press
页	13130-13138
页数	9
ISBN（电子版）	9781577358800
出版状态	已出版 - 27 6月 2023
已对外发布	是
活动	37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, 美国期限: 7 2月 2023 → 14 2月 2023

出版系列

姓名	Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
卷	37

会议

会议	37th AAAI Conference on Artificial Intelligence, AAAI 2023
国家/地区	美国
市	Washington
时期	7/02/23 → 14/02/23

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{5933caba5a0f49d1b524f03f15e06258,

title = "PGSS: Pitch-Guided Speech Separation",

abstract = "Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.",

author = "Xiang Li and Yiwen Wang and Yifan Sun and Xihong Wu and Jing Chen",

note = "Publisher Copyright: Copyright {\textcopyright} 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 37th AAAI Conference on Artificial Intelligence, AAAI 2023 ; Conference date: 07-02-2023 Through 14-02-2023",

year = "2023",

month = jun,

day = "27",

language = "English",

series = "Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023",

publisher = "AAAI press",

pages = "13130--13138",

editor = "Brian Williams and Yiling Chen and Jennifer Neville",

booktitle = "AAAI-23 Technical Tracks 11",

}

TY - GEN

T1 - PGSS

T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023

AU - Li, Xiang

AU - Wang, Yiwen

AU - Sun, Yifan

AU - Wu, Xihong

AU - Chen, Jing

PY - 2023/6/27

Y1 - 2023/6/27

N2 - Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.

AB - Monaural speech separation aims to separate concurrent speakers from a single-microphone mixture recording. Inspired by the effect of pitch priming in auditory scene analysis (ASA) mechanisms, a novel pitch-guided speech separation framework is proposed in this work. The prominent advantage of this framework is that both the permutation problem and the unknown speaker number problem existing in general models can be avoided by using pitch contours as the primary means to guide the target speaker. In addition, adversarial training is applied, instead of a traditional time-frequency mask, to improve the perceptual quality of separated speech. Specifically, the proposed framework can be divided into two phases: pitch extraction and speech separation. The former aims to extract pitch contour candidates for each speaker from the mixture, modeling the bottom-up process in ASA mechanisms. Any pitch contour can be selected as the condition in the second phase to separate the corresponding speaker, where a conditional generative adversarial network (CGAN) is applied. The second phase models the effect of pitch priming in ASA. Experiments on the WSJ0-2mix corpus reveal that the proposed approaches can achieve higher pitch extraction accuracy and better separation performance, compared to the baseline models, and have the potential to be applied to SOTA architectures.

UR - http://www.scopus.com/inward/record.url?scp=85167997397&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85167997397

T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023

SP - 13130

EP - 13138

BT - AAAI-23 Technical Tracks 11

A2 - Williams, Brian

A2 - Chen, Yiling

A2 - Neville, Jennifer

PB - AAAI press

Y2 - 7 February 2023 through 14 February 2023

ER -

PGSS: Pitch-Guided Speech Separation

摘要

出版系列

会议

其它文件与链接

指纹

引用此