IBACodec: End-to-end speech codec with intra-inter broad attention

Xiaonan Yang, Jinjie Zhou, Deshan Yang, Yunwei Wan, Limin Pan*, Senlin Luo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Speech compression attempts to yield a compact bitstream that can represent a speech signal with minimal distortion by eliminating redundant information, which is increasingly challenging as the bitrate decreases. However, existing neural speech codecs do not fully exploit the information from previous speech sequences, and learning encoded features blindly leads to the ineffective removal of redundant information, resulting in suboptimal reconstruction quality. In this work, we propose an end-to-end speech codec with intra-inter broad attention, named IBACodec, that efficiently compresses speech across different types of datasets, including LibriTTS, LJSpeech, and more. By designing an intra-inter broad transformer that integrates multi-head attention networks and LSTM, our model captures broad attention with direct context awareness between the intra- and inter-frames of speech. Furthermore, we present a dual-branch conformer for channel-wise modeling to effectively eliminate redundant information. In subjective evaluations using speech at a 24 kHz sampling rate, IBACodec at 6.3 kbps is comparable to SoundStream at 9 kbps and better than Opus at 9 kbps, with about 30 % fewer bits. Objective experimental results show that IBACodec outperforms state-of-the-art codecs across a wide range of bitrates, with an average ViSQOL, LLR, and CEP improvement of up to 4.97 %, 38.94 %, and 25.39 %, respectively.

Original languageEnglish
Article number103979
JournalInformation Processing and Management
Volume62
Issue number3
DOIs
Publication statusPublished - May 2025

Keywords

  • Intra-inter
  • Neural networks
  • Speech coding
  • Transformers
  • VQ-VAE

Fingerprint

Dive into the research topics of 'IBACodec: End-to-end speech codec with intra-inter broad attention'. Together they form a unique fingerprint.

Cite this

Yang, X., Zhou, J., Yang, D., Wan, Y., Pan, L., & Luo, S. (2025). IBACodec: End-to-end speech codec with intra-inter broad attention. Information Processing and Management, 62(3), Article 103979. https://doi.org/10.1016/j.ipm.2024.103979