A Chinese Speech Recognition System Based on Fusion Network Structure

Lunvi Guo, Shining Mu, Chaofan Shi, Bo Yan, Zhouling Xiao, Sheng Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

The purpose of an automatic speech recognition system is to convert speech into recognizable text. Chinese is a language in which the same pronunciation but different writing means different meanings. At present, there are relatively few researches on Chinese speech recognition. Therefore, we propose a Chinese automatic speech recognition system based on the fusion network RRAINet and End-to-End structure acoustic model + language model. We treat the speech signal as a visual problem, and use the Mel spectrum and SpecAugment methods to preprocess the data. The model is trained by connected time series classification criteria and decoded based on a greedy algorithm, which can convert speech signals into Chinese characters. Experiments show that the model phoneme error rate is 12.56% and 12.38% on the dev set and the test set of Free ST(ST-CMDS-20170001_1-OS). The model word error rates are 18.79% and 18.74%, which are about 5% lower than the baseline VGG-CTC model.

Original languageEnglish
Title of host publication2021 IEEE 21st International Conference on Communication Technology, ICCT 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1271-1276
Number of pages6
ISBN (Electronic)9781665432061
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event21st IEEE International Conference on Communication Technology, ICCT 2021 - Tianjin, China
Duration: 13 Oct 202116 Oct 2021

Publication series

NameInternational Conference on Communication Technology Proceedings, ICCT
Volume2021-October

Conference

Conference21st IEEE International Conference on Communication Technology, ICCT 2021
Country/TerritoryChina
CityTianjin
Period13/10/2116/10/21

Keywords

  • CTC
  • data preprocessing
  • Fusion structure
  • Markov language model
  • speech recognition

Fingerprint

Dive into the research topics of 'A Chinese Speech Recognition System Based on Fusion Network Structure'. Together they form a unique fingerprint.

Cite this