A Differential Quantization Based END-TO-END Neural Speech Codec

Pincheng Lu, Liang Xu, Jing Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech codecs efficiently compress speech signals, reducing the bandwidth occupied during communication. With the development of neural networks and deep learning, end-to-end speech codecs based on neural network structures have emerged. Compared to traditional codecs, these neural speech codecs can reconstruct higher-quality speech at lower bitrates. However, the performance of neural speech codecs drastically deteriorates when the communication bitrate drops to 1 kbps or below, as these codecs are based on residual quantization, which has limited performance at low bitrates. In this paper, a differential quantization based neural speech codec is proposed. In particular, the quantization focuses on the importance of difference frames and preserves key information with as few bits as possible. Meanwhile, we propose a compensator to further improve the reconstructed speech quality. Both subjective and objective evaluations demonstrate that our proposed method can achieve a higher quality of reconstructed speech at 0.6 kbps than SoundStream at 3 kbps. The entire model is causal, supporting streaming and real-time inference.

Original languageEnglish
Title of host publication2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
EditorsYanmin Qian, Qin Jin, Zhijian Ou, Zhenhua Ling, Zhiyong Wu, Ya Li, Lei Xie, Jianhua Tao
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages71-75
Number of pages5
ISBN (Electronic)9798331516826
DOIs
Publication statusPublished - 2024
Event14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 - Beijing, China
Duration: 7 Nov 202410 Nov 2024

Publication series

Name2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

Conference

Conference14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Country/TerritoryChina
CityBeijing
Period7/11/2410/11/24

Keywords

  • low bitrate
  • neural speech codec
  • vector quantization

Fingerprint

Dive into the research topics of 'A Differential Quantization Based END-TO-END Neural Speech Codec'. Together they form a unique fingerprint.

Cite this