跳到主要导航 跳到搜索 跳到主要内容

A GAN Based Codec with Vocal Tract Features for Low Rate Speech Coding

  • Liang Xu
  • , Jing Wang*
  • , Lizhong Wang
  • , Xiaojiao Chen
  • , Pincheng Lu
  • , Jianqian Zhang
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Samsung

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In this paper, we propose a GAN-based codec for wideband speech coding at low bitrates, which contains a conventional encoder and a neural vocoder with lower dimensional input. The method contains the following two innovations: Firstly, we add vocal tract features as vocoder input, fusing them with energy features, so that the model improves the generated speech quality under limited input dimensions. Secondly, we propose a sub-band time-frequency discriminator. The discriminator divides the input frequency band into several sub-bands according to the auditory sense, independently learning and discriminating the features in each sub-band. The experimental results show that the proposed method reaches the state-of-the-art for low bit rate speech coding, and the computational complexity of proposed method is about 2 GMACs, demonstrating the superior performance of the proposed method.

源语言英语
主期刊名2024 7th International Conference on Information Communication and Signal Processing, ICICSP 2024
出版商Institute of Electrical and Electronics Engineers Inc.
118-122
页数5
ISBN(电子版)9798350355895
DOI
出版状态已出版 - 2024
已对外发布
活动7th International Conference on Information Communication and Signal Processing, ICICSP 2024 - Zhoushan, 中国
期限: 21 9月 202423 9月 2024

出版系列

姓名2024 7th International Conference on Information Communication and Signal Processing, ICICSP 2024

会议

会议7th International Conference on Information Communication and Signal Processing, ICICSP 2024
国家/地区中国
Zhoushan
时期21/09/2423/09/24

指纹

探究 'A GAN Based Codec with Vocal Tract Features for Low Rate Speech Coding' 的科研主题。它们共同构成独一无二的指纹。

引用此