TY - JOUR
T1 - High Quality and Secure Speech Transmission at Low Bitrate via Semantic-Acoustic Hybrid Coding for Low-Altitude Intelligent Systems
AU - Chen, Bo
AU - An, Jianping
AU - Gui, Bowen
AU - Zeng, Liang
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2026
Y1 - 2026
N2 - In low-altitude communication scenarios, voice transmission imposes stringent requirements on encoding methods in terms of quality, efficiency, and security. Maintaining speech clarity and naturalness at low bit rates is essential for reliable task execution, while bandwidth limitations and the risk of voiceprint leakage further complicate system design. Recent neural-network-based codecs have demonstrated remarkable performance in compressing speech into discrete semantic representations with reduced bandwidth consumption. However, these approaches still face challenges in preserving perceptual quality, achieving flexible bandwidth utilization, and preventing speaker identity leakage. To address these limitations, this paper presents a semantic-acoustic hybrid coding framework that integrates pre-trained semantic modeling with lightweight acoustic representations within an end-to-end encoder-decoder architecture. The framework introduces a semantic-guided dynamic masking mechanism that adaptively selects acoustic feature dimensions based on semantic feature density, enabling intelligent, content-aware bit allocation. Moreover, by substituting voiceprint-related attributes while preserving semantic consistency, it achieves effective speaker anonymization. Experimental results demonstrate that the proposed method provides high-fidelity reconstruction with semantic enhancement, achieving 88 MUSHRA score and outperforming conventional single-channel low-bitrate codecs with near-perceptually lossless speech reconstruction. The framework further enables semantic-driven adaptive bitrate control, offering flexible trade-offs between speech quality and bandwidth through hyperparameter tuning during training. In addition, it exhibits strong speaker anonymization capability by generating non-original voice characteristics while preserving semantic coherence. The training scripts and audio demos can be found at https://github.com/az1mus/Hybrid-Semantic-acoustic-Voice-Codec.
AB - In low-altitude communication scenarios, voice transmission imposes stringent requirements on encoding methods in terms of quality, efficiency, and security. Maintaining speech clarity and naturalness at low bit rates is essential for reliable task execution, while bandwidth limitations and the risk of voiceprint leakage further complicate system design. Recent neural-network-based codecs have demonstrated remarkable performance in compressing speech into discrete semantic representations with reduced bandwidth consumption. However, these approaches still face challenges in preserving perceptual quality, achieving flexible bandwidth utilization, and preventing speaker identity leakage. To address these limitations, this paper presents a semantic-acoustic hybrid coding framework that integrates pre-trained semantic modeling with lightweight acoustic representations within an end-to-end encoder-decoder architecture. The framework introduces a semantic-guided dynamic masking mechanism that adaptively selects acoustic feature dimensions based on semantic feature density, enabling intelligent, content-aware bit allocation. Moreover, by substituting voiceprint-related attributes while preserving semantic consistency, it achieves effective speaker anonymization. Experimental results demonstrate that the proposed method provides high-fidelity reconstruction with semantic enhancement, achieving 88 MUSHRA score and outperforming conventional single-channel low-bitrate codecs with near-perceptually lossless speech reconstruction. The framework further enables semantic-driven adaptive bitrate control, offering flexible trade-offs between speech quality and bandwidth through hyperparameter tuning during training. In addition, it exhibits strong speaker anonymization capability by generating non-original voice characteristics while preserving semantic coherence. The training scripts and audio demos can be found at https://github.com/az1mus/Hybrid-Semantic-acoustic-Voice-Codec.
KW - Deep Learning
KW - Security
KW - Semantic Coding
KW - Speech Coding
UR - https://www.scopus.com/pages/publications/105028482701
U2 - 10.1109/TCCN.2026.3657147
DO - 10.1109/TCCN.2026.3657147
M3 - Article
AN - SCOPUS:105028482701
SN - 2332-7731
JO - IEEE Transactions on Cognitive Communications and Networking
JF - IEEE Transactions on Cognitive Communications and Networking
ER -