跳到主要导航 跳到搜索 跳到主要内容

基于 Conv-TasNet 的多特征融合音视频联合语音分离算法

科研成果: 期刊稿件文章同行评审

摘要

The audiovisual multimodal modeling has been verified to be effective in speech separation tasks. This paper proposes a speech separation model to improve the existing time-domain audio visual joint speech separation algorithm, and enhances the connection between audio and visual streams. Aiming at the situation that the existing audio-visual separation models are not highly integrated, authors propose a end to end model which combines audio leatures with additional input visual features multiple times in time domain, and adds the means of vertical weight sharing. The model was trained and evaluated on the GRID data set. Experiments show that compared with Conv-TasNet which only uses audio and Conv-TasNet combines with audio and video, the performance of our model is improved by 1.2 dB and 0. 4 dB respectively.

投稿的翻译标题Multi Feature Fusion Audio-visual Joint Speech Separation Algorithm Based on Conv-TasNet
源语言繁体中文
页(从-至)1799-1805
页数7
期刊Journal of Signal Processing
37
10
DOI
出版状态已出版 - 10月 2021

关键词

  • audio separation
  • audio-visual joint
  • deep neural network
  • multi feature fusion

指纹

探究 '基于 Conv-TasNet 的多特征融合音视频联合语音分离算法' 的科研主题。它们共同构成独一无二的指纹。

引用此