Neural Audio Coding with Deep Complex Networks

  • Jiawei Ru
  • , Lizhong Wang
  • , Maoshen Jia
  • , Liang Wen
  • , Chunxi Wang
  • , Yuhao Zhao
  • , Jing Wang

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

This paper proposes a transform domain audio coding method based on deep complex networks. In the proposed codec, the time-frequency spectrum of the audio signal is fed to the encoder which consists of complex convolutional blocks and a frequency-temporal modeling module to obtain the extracted features which are then quantized with a target bitrate by the vector quantizer. The structure of the decoder which reconstruct the time-frequency spectrum of the audio from quantized features is symmetrical to the encoder. In this paper, a structure combining the complex multi-head self-attention module and the complex long short-term memory is proposed to capture both frequency and temporal dependencies. Subjective and objective evaluation tests show the advantage of the proposed method.

Original languageEnglish
Article number012005
JournalJournal of Physics: Conference Series
Volume2759
Issue number1
DOIs
Publication statusPublished - 2024
Event2024 8th International Conference on Machine Vision and Information Technology, CMVIT 2024 - Hybrid, Singapore, Singapore
Duration: 23 Feb 202425 Feb 2024

Fingerprint

Dive into the research topics of 'Neural Audio Coding with Deep Complex Networks'. Together they form a unique fingerprint.

Cite this