Audio-Visual Speech Separation Using I-Vectors

Yiyu Luo; Jing Wang; Xinyao Wang; Liang Wen; Lizhong Wang

doi:10.1109/ICICSP48821.2019.8958547

Audio-Visual Speech Separation Using I-Vectors

Yiyu Luo, Jing Wang, Xinyao Wang, Liang Wen, Lizhong Wang

信息与电子学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

7 引用（Scopus）

摘要

Speech separation is the task of extracting target speech from background interference. In applications like home devices or office meeting, prior knowledge about possible speaker is available, which can be leveraged for speech separation. This paper proposes a novel audio-visual-speaker speech separation model that decomposes a monaural speech signal into two speech segments belonging to different speakers, by making use of audio-visual inputs and i-vector speaker embeddings. The proposed model is based on a BLSTM network to generate complex time-frequency masks which can be applied to the acoustic mixed-speech spectrogram. We train and evaluate our model on a speech separation task derived from the VoxCeleb2 dataset and show effectiveness of the method.

源语言	英语
主期刊名	2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019
出版商	Institute of Electrical and Electronics Engineers Inc.
页	276-280
页数	5
ISBN（电子版）	9781728151021
DOI	https://doi.org/10.1109/ICICSP48821.2019.8958547
出版状态	已出版 - 9月 2019
活动	2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019 - Weihai, 中国期限: 28 9月 2019 → 30 9月 2019

出版系列

姓名	2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019

会议

会议	2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019
国家/地区	中国
市	Weihai
时期	28/09/19 → 30/09/19

访问文件

10.1109/ICICSP48821.2019.8958547

其它文件与链接

链接到 Scopus 的出版物

引用此

Luo, Y., Wang, J., Wang, X., Wen, L., & Wang, L. (2019). Audio-Visual Speech Separation Using I-Vectors. 在 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019 (页码 276-280). 文章 8958547 (2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICICSP48821.2019.8958547

Luo, Yiyu ; Wang, Jing ; Wang, Xinyao 等. / Audio-Visual Speech Separation Using I-Vectors. 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 276-280 (2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019).

@inproceedings{f823eceb723f40308e478ec8654860bf,

title = "Audio-Visual Speech Separation Using I-Vectors",

abstract = "Speech separation is the task of extracting target speech from background interference. In applications like home devices or office meeting, prior knowledge about possible speaker is available, which can be leveraged for speech separation. This paper proposes a novel audio-visual-speaker speech separation model that decomposes a monaural speech signal into two speech segments belonging to different speakers, by making use of audio-visual inputs and i-vector speaker embeddings. The proposed model is based on a BLSTM network to generate complex time-frequency masks which can be applied to the acoustic mixed-speech spectrogram. We train and evaluate our model on a speech separation task derived from the VoxCeleb2 dataset and show effectiveness of the method.",

keywords = "Audio-visual speech separation, Cocktail party problem, I-vectors, Speaker embeddings",

author = "Yiyu Luo and Jing Wang and Xinyao Wang and Liang Wen and Lizhong Wang",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019 ; Conference date: 28-09-2019 Through 30-09-2019",

year = "2019",

month = sep,

doi = "10.1109/ICICSP48821.2019.8958547",

language = "English",

series = "2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "276--280",

booktitle = "2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019",

address = "United States",

}

Luo, Y, Wang, J, Wang, X, Wen, L & Wang, L 2019, Audio-Visual Speech Separation Using I-Vectors. 在 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019., 8958547, 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019, Institute of Electrical and Electronics Engineers Inc., 页码 276-280, 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019, Weihai, 中国, 28/09/19. https://doi.org/10.1109/ICICSP48821.2019.8958547

Audio-Visual Speech Separation Using I-Vectors. / Luo, Yiyu; Wang, Jing; Wang, Xinyao 等.
2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 276-280 8958547 (2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Audio-Visual Speech Separation Using I-Vectors

AU - Luo, Yiyu

AU - Wang, Jing

AU - Wang, Xinyao

AU - Wen, Liang

AU - Wang, Lizhong

PY - 2019/9

Y1 - 2019/9

N2 - Speech separation is the task of extracting target speech from background interference. In applications like home devices or office meeting, prior knowledge about possible speaker is available, which can be leveraged for speech separation. This paper proposes a novel audio-visual-speaker speech separation model that decomposes a monaural speech signal into two speech segments belonging to different speakers, by making use of audio-visual inputs and i-vector speaker embeddings. The proposed model is based on a BLSTM network to generate complex time-frequency masks which can be applied to the acoustic mixed-speech spectrogram. We train and evaluate our model on a speech separation task derived from the VoxCeleb2 dataset and show effectiveness of the method.

AB - Speech separation is the task of extracting target speech from background interference. In applications like home devices or office meeting, prior knowledge about possible speaker is available, which can be leveraged for speech separation. This paper proposes a novel audio-visual-speaker speech separation model that decomposes a monaural speech signal into two speech segments belonging to different speakers, by making use of audio-visual inputs and i-vector speaker embeddings. The proposed model is based on a BLSTM network to generate complex time-frequency masks which can be applied to the acoustic mixed-speech spectrogram. We train and evaluate our model on a speech separation task derived from the VoxCeleb2 dataset and show effectiveness of the method.

KW - Audio-visual speech separation

KW - Cocktail party problem

KW - I-vectors

KW - Speaker embeddings

UR - http://www.scopus.com/inward/record.url?scp=85078884755&partnerID=8YFLogxK

U2 - 10.1109/ICICSP48821.2019.8958547

DO - 10.1109/ICICSP48821.2019.8958547

M3 - Conference contribution

AN - SCOPUS:85078884755

T3 - 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019

SP - 276

EP - 280

BT - 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019

Y2 - 28 September 2019 through 30 September 2019

ER -

Luo Y, Wang J, Wang X, Wen L, Wang L. Audio-Visual Speech Separation Using I-Vectors. 在 2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 276-280. 8958547. (2019 2nd IEEE International Conference on Information Communication and Signal Processing, ICICSP 2019). doi: 10.1109/ICICSP48821.2019.8958547

Audio-Visual Speech Separation Using I-Vectors

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此