Robust speech recognition based on multi-objective learning with GRU network

Ming Liu; Yujun Wang; Zhaoyu Yan; Jing Wang; Xiang Xie

doi:10.1109/APSIPAASC47483.2019.9023325

Robust speech recognition based on multi-objective learning with GRU network

Ming Liu, Yujun Wang, Zhaoyu Yan, Jing Wang, Xiang Xie

School of Information and Electronics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

This paper proposes a new scheme to execute the task of speech enhancement (SE) for recognition based on multi-objective learning method which uses three objectives in the gated recurrent unit (GRU) network training procedure. The first objective is the main target for the expected SE task by directly mapping the noisy log-power spectrum (LPS) features to clean Mel-frequency cepstral coefficients (MFCC) features. The second one is an auxiliary target to help improving the main one by learning additional information from the backend acoustic model (AM). The third one is also an auxiliary target achieved by learning some information from mapping noisy LPS to clean LPS. The two auxiliary structures could help the original structure to optimize the network parameters by correcting the errors. This approach imposes more constraints on direct feature mapping and information passing from the acoustic model to the network, enabling the enhanced network to better serve the AM. The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE. And this scheme significantly lowers the Character Error Rate (CER) of the AM compared to the baseline deep neural network (DNN) network ¹¹This work is done when Ming Liu was an intern in Speech Group, Xiaomi Corporation, Beijing, China..

Original language	English
Title of host publication	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	181-185
Number of pages	5
ISBN (Electronic)	9781728132488
DOIs	https://doi.org/10.1109/APSIPAASC47483.2019.9023325
Publication status	Published - Nov 2019
Event	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China Duration: 18 Nov 2019 → 21 Nov 2019

Publication series

Name	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference	2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Country/Territory	China
City	Lanzhou
Period	18/11/19 → 21/11/19

Access to Document

10.1109/APSIPAASC47483.2019.9023325

Cite this

Liu, M., Wang, Y., Yan, Z., Wang, J., & Xie, X. (2019). Robust speech recognition based on multi-objective learning with GRU network. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 (pp. 181-185). Article 9023325 (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPAASC47483.2019.9023325

Liu, Ming ; Wang, Yujun ; Yan, Zhaoyu et al. / Robust speech recognition based on multi-objective learning with GRU network. 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 181-185 (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019).

@inproceedings{fd580d98076a47009d151a3359fe58c1,

title = "Robust speech recognition based on multi-objective learning with GRU network",

abstract = "This paper proposes a new scheme to execute the task of speech enhancement (SE) for recognition based on multi-objective learning method which uses three objectives in the gated recurrent unit (GRU) network training procedure. The first objective is the main target for the expected SE task by directly mapping the noisy log-power spectrum (LPS) features to clean Mel-frequency cepstral coefficients (MFCC) features. The second one is an auxiliary target to help improving the main one by learning additional information from the backend acoustic model (AM). The third one is also an auxiliary target achieved by learning some information from mapping noisy LPS to clean LPS. The two auxiliary structures could help the original structure to optimize the network parameters by correcting the errors. This approach imposes more constraints on direct feature mapping and information passing from the acoustic model to the network, enabling the enhanced network to better serve the AM. The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE. And this scheme significantly lowers the Character Error Rate (CER) of the AM compared to the baseline deep neural network (DNN) network 11This work is done when Ming Liu was an intern in Speech Group, Xiaomi Corporation, Beijing, China..",

author = "Ming Liu and Yujun Wang and Zhaoyu Yan and Jing Wang and Xiang Xie",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 ; Conference date: 18-11-2019 Through 21-11-2019",

year = "2019",

month = nov,

doi = "10.1109/APSIPAASC47483.2019.9023325",

language = "English",

series = "2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "181--185",

booktitle = "2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019",

address = "United States",

}

Liu, M, Wang, Y, Yan, Z, Wang, J & Xie, X 2019, Robust speech recognition based on multi-objective learning with GRU network. in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019., 9023325, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Institute of Electrical and Electronics Engineers Inc., pp. 181-185, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, 18/11/19. https://doi.org/10.1109/APSIPAASC47483.2019.9023325

Robust speech recognition based on multi-objective learning with GRU network. / Liu, Ming; Wang, Yujun; Yan, Zhaoyu et al.
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 181-185 9023325 (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Robust speech recognition based on multi-objective learning with GRU network

AU - Liu, Ming

AU - Wang, Yujun

AU - Yan, Zhaoyu

AU - Wang, Jing

AU - Xie, Xiang

PY - 2019/11

Y1 - 2019/11

N2 - This paper proposes a new scheme to execute the task of speech enhancement (SE) for recognition based on multi-objective learning method which uses three objectives in the gated recurrent unit (GRU) network training procedure. The first objective is the main target for the expected SE task by directly mapping the noisy log-power spectrum (LPS) features to clean Mel-frequency cepstral coefficients (MFCC) features. The second one is an auxiliary target to help improving the main one by learning additional information from the backend acoustic model (AM). The third one is also an auxiliary target achieved by learning some information from mapping noisy LPS to clean LPS. The two auxiliary structures could help the original structure to optimize the network parameters by correcting the errors. This approach imposes more constraints on direct feature mapping and information passing from the acoustic model to the network, enabling the enhanced network to better serve the AM. The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE. And this scheme significantly lowers the Character Error Rate (CER) of the AM compared to the baseline deep neural network (DNN) network 11This work is done when Ming Liu was an intern in Speech Group, Xiaomi Corporation, Beijing, China..

AB - This paper proposes a new scheme to execute the task of speech enhancement (SE) for recognition based on multi-objective learning method which uses three objectives in the gated recurrent unit (GRU) network training procedure. The first objective is the main target for the expected SE task by directly mapping the noisy log-power spectrum (LPS) features to clean Mel-frequency cepstral coefficients (MFCC) features. The second one is an auxiliary target to help improving the main one by learning additional information from the backend acoustic model (AM). The third one is also an auxiliary target achieved by learning some information from mapping noisy LPS to clean LPS. The two auxiliary structures could help the original structure to optimize the network parameters by correcting the errors. This approach imposes more constraints on direct feature mapping and information passing from the acoustic model to the network, enabling the enhanced network to better serve the AM. The experimental results show that the new multi-objective scheme with joint feature mapping and the posterior probability learning method improves the performance of SE. And this scheme significantly lowers the Character Error Rate (CER) of the AM compared to the baseline deep neural network (DNN) network 11This work is done when Ming Liu was an intern in Speech Group, Xiaomi Corporation, Beijing, China..

UR - http://www.scopus.com/inward/record.url?scp=85082402190&partnerID=8YFLogxK

U2 - 10.1109/APSIPAASC47483.2019.9023325

DO - 10.1109/APSIPAASC47483.2019.9023325

M3 - Conference contribution

AN - SCOPUS:85082402190

T3 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

SP - 181

EP - 185

BT - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Y2 - 18 November 2019 through 21 November 2019

ER -

Liu M, Wang Y, Yan Z, Wang J , Xie X. Robust speech recognition based on multi-objective learning with GRU network. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 181-185. 9023325. (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019). doi: 10.1109/APSIPAASC47483.2019.9023325

Robust speech recognition based on multi-objective learning with GRU network

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this