Low-Bitrate High-Quality Digital Semantic Communication Based on RVQGAN

Xiaojiao Chen, Jing Wang, Jingxuan Huang*, Ming Zeng, Zhong Zheng, Zesong Fei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Digital semantic communication has attracted considerable attention attributed to its potential for integration with modern digital communication systems, which has demonstrated significant performance gains. However, despite its ability to save transmission bandwidth, digital semantic communication can degrade the performance of tasks at the receiver, particularly in low-bitrate scenarios. In this paper, we propose a novel low-bitrate digital semantic communication method based on a generative model for speech transmission to achieve high-quality reconstructed speech at low-bitrate transmission. In particular, we first investigate a multi-scale semantic codec based on residual vector quantization with a generative adversary network (RVQGAN) model for extracting semantic information and obtaining high speech reconstruction quality while transmitting at a low bitrate. We then design a channel noise suppression module based on U-Net to alleviate the channel effect at low signal-to-noise ratio (SNR) by restoring high-quality semantic features, which is capable of improving the performance of the proposed method under challenging channel conditions. Moreover, a Transformer-based code predictor is utilized to further improve the robustness of the proposed method by accounting for both the channel impact and reconstruction quality. Finally, a three-stage training strategy is also presented in this paper to ensure the effective operation of the proposed multi-scale semantic codec, channel noise suppression module, and code predictor module. Experimental results demonstrate that the proposed method operating at 3 kbps can save at least 50% of bandwidth while achieving higher speech restoration quality than the baseline method.

Original languageEnglish
JournalIEEE Internet of Things Journal
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Digital semantic communication
  • generative model
  • low-bitrate
  • speech transmission

Fingerprint

Dive into the research topics of 'Low-Bitrate High-Quality Digital Semantic Communication Based on RVQGAN'. Together they form a unique fingerprint.

Cite this

Chen, X., Wang, J., Huang, J., Zeng, M., Zheng, Z., & Fei, Z. (Accepted/In press). Low-Bitrate High-Quality Digital Semantic Communication Based on RVQGAN. IEEE Internet of Things Journal. https://doi.org/10.1109/JIOT.2025.3534462