TY - GEN
T1 - A GAN Based Codec with Vocal Tract Features for Low Rate Speech Coding
AU - Xu, Liang
AU - Wang, Jing
AU - Wang, Lizhong
AU - Chen, Xiaojiao
AU - Lu, Pincheng
AU - Zhang, Jianqian
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In this paper, we propose a GAN-based codec for wideband speech coding at low bitrates, which contains a conventional encoder and a neural vocoder with lower dimensional input. The method contains the following two innovations: Firstly, we add vocal tract features as vocoder input, fusing them with energy features, so that the model improves the generated speech quality under limited input dimensions. Secondly, we propose a sub-band time-frequency discriminator. The discriminator divides the input frequency band into several sub-bands according to the auditory sense, independently learning and discriminating the features in each sub-band. The experimental results show that the proposed method reaches the state-of-the-art for low bit rate speech coding, and the computational complexity of proposed method is about 2 GMACs, demonstrating the superior performance of the proposed method.
AB - In this paper, we propose a GAN-based codec for wideband speech coding at low bitrates, which contains a conventional encoder and a neural vocoder with lower dimensional input. The method contains the following two innovations: Firstly, we add vocal tract features as vocoder input, fusing them with energy features, so that the model improves the generated speech quality under limited input dimensions. Secondly, we propose a sub-band time-frequency discriminator. The discriminator divides the input frequency band into several sub-bands according to the auditory sense, independently learning and discriminating the features in each sub-band. The experimental results show that the proposed method reaches the state-of-the-art for low bit rate speech coding, and the computational complexity of proposed method is about 2 GMACs, demonstrating the superior performance of the proposed method.
KW - Audio codec
KW - Low complexity
KW - Vector quantization
UR - https://www.scopus.com/pages/publications/105010173762
U2 - 10.1109/ICICSP62589.2024.10809286
DO - 10.1109/ICICSP62589.2024.10809286
M3 - Conference contribution
AN - SCOPUS:105010173762
T3 - 2024 7th International Conference on Information Communication and Signal Processing, ICICSP 2024
SP - 118
EP - 122
BT - 2024 7th International Conference on Information Communication and Signal Processing, ICICSP 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Information Communication and Signal Processing, ICICSP 2024
Y2 - 21 September 2024 through 23 September 2024
ER -