跳到主要导航 跳到搜索 跳到主要内容

A Differential Quantization Based END-TO-END Neural Speech Codec

  • Beijing Institute of Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Speech codecs efficiently compress speech signals, reducing the bandwidth occupied during communication. With the development of neural networks and deep learning, end-to-end speech codecs based on neural network structures have emerged. Compared to traditional codecs, these neural speech codecs can reconstruct higher-quality speech at lower bitrates. However, the performance of neural speech codecs drastically deteriorates when the communication bitrate drops to 1 kbps or below, as these codecs are based on residual quantization, which has limited performance at low bitrates. In this paper, a differential quantization based neural speech codec is proposed. In particular, the quantization focuses on the importance of difference frames and preserves key information with as few bits as possible. Meanwhile, we propose a compensator to further improve the reconstructed speech quality. Both subjective and objective evaluations demonstrate that our proposed method can achieve a higher quality of reconstructed speech at 0.6 kbps than SoundStream at 3 kbps. The entire model is causal, supporting streaming and real-time inference.

源语言英语
主期刊名2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
编辑Yanmin Qian, Qin Jin, Zhijian Ou, Zhenhua Ling, Zhiyong Wu, Ya Li, Lei Xie, Jianhua Tao
出版商Institute of Electrical and Electronics Engineers Inc.
71-75
页数5
ISBN(电子版)9798331516826
DOI
出版状态已出版 - 2024
活动14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024 - Beijing, 中国
期限: 7 11月 202410 11月 2024

出版系列

姓名2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024

会议

会议14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
国家/地区中国
Beijing
时期7/11/2410/11/24

指纹

探究 'A Differential Quantization Based END-TO-END Neural Speech Codec' 的科研主题。它们共同构成独一无二的指纹。

引用此