IBACodec: End-to-end speech codec with intra-inter broad attention

Xiaonan Yang, Jinjie Zhou, Deshan Yang, Yunwei Wan, Limin Pan*, Senlin Luo

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Speech compression attempts to yield a compact bitstream that can represent a speech signal with minimal distortion by eliminating redundant information, which is increasingly challenging as the bitrate decreases. However, existing neural speech codecs do not fully exploit the information from previous speech sequences, and learning encoded features blindly leads to the ineffective removal of redundant information, resulting in suboptimal reconstruction quality. In this work, we propose an end-to-end speech codec with intra-inter broad attention, named IBACodec, that efficiently compresses speech across different types of datasets, including LibriTTS, LJSpeech, and more. By designing an intra-inter broad transformer that integrates multi-head attention networks and LSTM, our model captures broad attention with direct context awareness between the intra- and inter-frames of speech. Furthermore, we present a dual-branch conformer for channel-wise modeling to effectively eliminate redundant information. In subjective evaluations using speech at a 24 kHz sampling rate, IBACodec at 6.3 kbps is comparable to SoundStream at 9 kbps and better than Opus at 9 kbps, with about 30 % fewer bits. Objective experimental results show that IBACodec outperforms state-of-the-art codecs across a wide range of bitrates, with an average ViSQOL, LLR, and CEP improvement of up to 4.97 %, 38.94 %, and 25.39 %, respectively.

源语言英语
文章编号103979
期刊Information Processing and Management
62
3
DOI
出版状态已出版 - 5月 2025

指纹

探究 'IBACodec: End-to-end speech codec with intra-inter broad attention' 的科研主题。它们共同构成独一无二的指纹。

引用此