Audio-Visual Speech Separation Using I-Vectors

Yiyu Luo, Jing Wang, Xinyao Wang, Liang Wen, Lizhong Wang

科研成果: 书/报告/会议事项章节会议稿件同行评审

7 引用 (Scopus)

摘要

Speech separation is the task of extracting target speech from background interference. In applications like home devices or office meeting, prior knowledge about possible speaker is available, which can be leveraged for speech separation. This paper proposes a novel audio-visual-speaker speech separation model that decomposes a monaural speech signal into two speech segments belonging to different speakers, by making use of audio-visual inputs and i-vector speaker embeddings. The proposed model is based on a BLSTM network to generate complex time-frequency masks which can be applied to the acoustic mixed-speech spectrogram. We train and evaluate our model on a speech separation task derived from the VoxCeleb2 dataset and show effectiveness of the method.

源语言英语
主期刊名2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019
出版商Institute of Electrical and Electronics Engineers Inc.
276-280
页数5
ISBN(电子版)9781728151021
DOI
出版状态已出版 - 9月 2019
活动2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019 - Weihai, 中国
期限: 28 9月 201930 9月 2019

出版系列

姓名2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019

会议

会议2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019
国家/地区中国
Weihai
时期28/09/1930/09/19

指纹

探究 'Audio-Visual Speech Separation Using I-Vectors' 的科研主题。它们共同构成独一无二的指纹。

引用此