Robust speech recognition based on multi-objective learning with GRU network

Ming Liu, Yujun Wang, Zhaoyu Yan, Jing Wang, Xiang Xie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a new scheme to execute the task of speech enhancement (SE) for recognition based on multi-objective learning method which uses three objectives in the gated recurrent unit (GRU) network training procedure. The first objective is the main target for the expected SE task by directly mapping the noisy log-power spectrum (LPS) features to clean Mel-frequency cepstral coefficients (MFCC) features. The second one is an auxiliary target to help improving the main one by learning additional information from the backend acoustic model (AM). The third one is also an auxiliary target achieved by learning some information from mapping noisy LPS to clean LPS. The two auxiliary structures could help the original structure to optimize the network parameters by correcting the errors. This approach imposes more constraints on direct feature mapping and information passing from the acoustic model to the network, enabling the enhanced network to better serve the AM. The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE. And this scheme significantly lowers the Character Error Rate (CER) of the AM compared to the baseline deep neural network (DNN) network 11This work is done when Ming Liu was an intern in Speech Group, Xiaomi Corporation, Beijing, China..

Original languageEnglish
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages181-185
Number of pages5
ISBN (Electronic)9781728132488
DOIs
Publication statusPublished - Nov 2019
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: 18 Nov 201921 Nov 2019

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Country/TerritoryChina
CityLanzhou
Period18/11/1921/11/19

Fingerprint

Dive into the research topics of 'Robust speech recognition based on multi-objective learning with GRU network'. Together they form a unique fingerprint.

Cite this