Audio-Visual Speech Separation Using I-Vectors

Yiyu Luo, Jing Wang, Xinyao Wang, Liang Wen, Lizhong Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

Speech separation is the task of extracting target speech from background interference. In applications like home devices or office meeting, prior knowledge about possible speaker is available, which can be leveraged for speech separation. This paper proposes a novel audio-visual-speaker speech separation model that decomposes a monaural speech signal into two speech segments belonging to different speakers, by making use of audio-visual inputs and i-vector speaker embeddings. The proposed model is based on a BLSTM network to generate complex time-frequency masks which can be applied to the acoustic mixed-speech spectrogram. We train and evaluate our model on a speech separation task derived from the VoxCeleb2 dataset and show effectiveness of the method.

Original languageEnglish
Title of host publication2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages276-280
Number of pages5
ISBN (Electronic)9781728151021
DOIs
Publication statusPublished - Sept 2019
Event2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019 - Weihai, China
Duration: 28 Sept 201930 Sept 2019

Publication series

Name2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019

Conference

Conference2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019
Country/TerritoryChina
CityWeihai
Period28/09/1930/09/19

Keywords

  • Audio-visual speech separation
  • Cocktail party problem
  • I-vectors
  • Speaker embeddings

Fingerprint

Dive into the research topics of 'Audio-Visual Speech Separation Using I-Vectors'. Together they form a unique fingerprint.

Cite this