High Quality and Secure Speech Transmission at Low Bitrate via Semantic-Acoustic Hybrid Coding for Low-Altitude Intelligent Systems

Research output: Contribution to journalArticlepeer-review

Abstract

In low-altitude communication scenarios, voice transmission imposes stringent requirements on encoding methods in terms of quality, efficiency, and security. Maintaining speech clarity and naturalness at low bit rates is essential for reliable task execution, while bandwidth limitations and the risk of voiceprint leakage further complicate system design. Recent neural-network-based codecs have demonstrated remarkable performance in compressing speech into discrete semantic representations with reduced bandwidth consumption. However, these approaches still face challenges in preserving perceptual quality, achieving flexible bandwidth utilization, and preventing speaker identity leakage. To address these limitations, this paper presents a semantic-acoustic hybrid coding framework that integrates pre-trained semantic modeling with lightweight acoustic representations within an end-to-end encoder-decoder architecture. The framework introduces a semantic-guided dynamic masking mechanism that adaptively selects acoustic feature dimensions based on semantic feature density, enabling intelligent, content-aware bit allocation. Moreover, by substituting voiceprint-related attributes while preserving semantic consistency, it achieves effective speaker anonymization. Experimental results demonstrate that the proposed method provides high-fidelity reconstruction with semantic enhancement, achieving 88 MUSHRA score and outperforming conventional single-channel low-bitrate codecs with near-perceptually lossless speech reconstruction. The framework further enables semantic-driven adaptive bitrate control, offering flexible trade-offs between speech quality and bandwidth through hyperparameter tuning during training. In addition, it exhibits strong speaker anonymization capability by generating non-original voice characteristics while preserving semantic coherence. The training scripts and audio demos can be found at https://github.com/az1mus/Hybrid-Semantic-acoustic-Voice-Codec.

Original languageEnglish
JournalIEEE Transactions on Cognitive Communications and Networking
DOIs
Publication statusAccepted/In press - 2026
Externally publishedYes

Keywords

  • Deep Learning
  • Security
  • Semantic Coding
  • Speech Coding

Fingerprint

Dive into the research topics of 'High Quality and Secure Speech Transmission at Low Bitrate via Semantic-Acoustic Hybrid Coding for Low-Altitude Intelligent Systems'. Together they form a unique fingerprint.

Cite this