Single-Channel Speech Separation Integrating Pitch Information Based on a Multi Task Learning Framework

Xiang Li, Rui Liu, Tao Song, Xihong Wu, Jing Chen

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Pitch is a critical cue for speech separation in humans' auditory perception. Although the technology of tracking pitch in single-talker speech succeeds in many applications, it's still a challenging problem to extract pitch information from speech mixtures in machine perception. In this paper, we aimed to combine speech separation and pitch tracking together to let them benefit from each other. A multi-task learning framework was proposed, in which a unified objective that considered both speech separation and pitch tracking was used, based on the utterance-level permutation invariant training (uPIT) as well as deep clustering (DPCL). In such framework, two tasks were optimized simultaneously and could benefit from each other through the sharing layers in the networks. Experimental results indicated the proposed multi-task framework outperformed the corresponding single-task framework, in terms of both speech separation and pitch tracking. The improvement was more significant for challenging same-gender mixtures.

源语言英语
主期刊名2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
7279-7283
页数5
ISBN(电子版)9781509066315
DOI
出版状态已出版 - 5月 2020
已对外发布
活动2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, 西班牙
期限: 4 5月 20208 5月 2020

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

会议

会议2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国家/地区西班牙
Barcelona
时期4/05/208/05/20

指纹

探究 'Single-Channel Speech Separation Integrating Pitch Information Based on a Multi Task Learning Framework' 的科研主题。它们共同构成独一无二的指纹。

引用此