DurIAN-SC: Duration informed attention network based singing voice conversion system

Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Chunlei Zhang, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu

科研成果: 书/报告/会议事项章节会议稿件同行评审

20 引用 (Scopus)

摘要

Singing voice conversion is converting the timbre in the source singing to the target speaker's voice while keeping singing content the same. However, singing data for target speaker is much more difficult to collect compared with normal speech data. In this paper, we introduce a singing voice conversion algorithm that is capable of generating high quality target speaker's singing using only his/her normal speech data. First, we manage to integrate the training and conversion process of speech and singing into one framework by unifying the features used in standard speech synthesis system and singing synthesis system. In this way, normal speech data can also contribute to singing voice conversion training, making the singing voice conversion system more robust especially when the singing database is small. Moreover, in order to achieve one-shot singing voice conversion, a speaker embedding module is developed using both speech and singing data, which provides target speaker identify information during conversion. Experiments indicate proposed sing conversion system can convert source singing to target speaker's high-quality singing with only 20 seconds of target speaker's enrollment speech data.

源语言英语
主期刊名Interspeech 2020
出版商International Speech Communication Association
1231-1235
页数5
ISBN(印刷版)9781713820697
DOI
出版状态已出版 - 2020
活动21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, 中国
期限: 25 10月 202029 10月 2020

出版系列

姓名Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2020-October
ISSN(印刷版)2308-457X
ISSN(电子版)1990-9772

会议

会议21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
国家/地区中国
Shanghai
时期25/10/2029/10/20

引用此