A GAN Based Codec with Vocal Tract Features for Low Rate Speech Coding

  • Liang Xu
  • , Jing Wang*
  • , Lizhong Wang
  • , Xiaojiao Chen
  • , Pincheng Lu
  • , Jianqian Zhang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we propose a GAN-based codec for wideband speech coding at low bitrates, which contains a conventional encoder and a neural vocoder with lower dimensional input. The method contains the following two innovations: Firstly, we add vocal tract features as vocoder input, fusing them with energy features, so that the model improves the generated speech quality under limited input dimensions. Secondly, we propose a sub-band time-frequency discriminator. The discriminator divides the input frequency band into several sub-bands according to the auditory sense, independently learning and discriminating the features in each sub-band. The experimental results show that the proposed method reaches the state-of-the-art for low bit rate speech coding, and the computational complexity of proposed method is about 2 GMACs, demonstrating the superior performance of the proposed method.

Original languageEnglish
Title of host publication2024 7th International Conference on Information Communication and Signal Processing, ICICSP 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages118-122
Number of pages5
ISBN (Electronic)9798350355895
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event7th International Conference on Information Communication and Signal Processing, ICICSP 2024 - Zhoushan, China
Duration: 21 Sept 202423 Sept 2024

Publication series

Name2024 7th International Conference on Information Communication and Signal Processing, ICICSP 2024

Conference

Conference7th International Conference on Information Communication and Signal Processing, ICICSP 2024
Country/TerritoryChina
CityZhoushan
Period21/09/2423/09/24

Keywords

  • Audio codec
  • Low complexity
  • Vector quantization

Fingerprint

Dive into the research topics of 'A GAN Based Codec with Vocal Tract Features for Low Rate Speech Coding'. Together they form a unique fingerprint.

Cite this