End-to-end Oriental Language Speech Recognition with Integrated Language Identification

Anbin Qi; Xiang Xie; Qingran Zhan; Chenguang Hu; Xinmei Su

doi:10.1109/MLCR57210.2022.00014

End-to-end Oriental Language Speech Recognition with Integrated Language Identification

Anbin Qi, Xiang Xie^*, Qingran Zhan, Chenguang Hu, Xinmei Su

^*此作品的通讯作者

信息与电子学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

In recent years, with the rise of human-computer interaction and the successful application of end-to-end models in the field of speech recognition, the construction of end-to-end speech recognition models has received extensive attention. Relying on the multi-task learning method and the connection between language identification and speech recognition, we proposed an end-to-end Transformer model, which is a multilingual speech recognition model integrating language identification. The model takes the speech recognition task as the main task and the language identification task as the auxiliary task. In this paper, the validity of the model is verified by using the datasets of 13 languages in the 2021 Oriental Language Recognition challenge (OLR). The experimental results show that the model constructed in this paper has a relative improvement of 37.46% in the speech recognition task compared with the baseline system proposed by the OLR organizer. The accuracy of language identification reaches 89.70 %. The results can get the fifth place in the 2021 OLR constraint track of speech recognition equally.

源语言	英语
主期刊名	Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022
出版商	Institute of Electrical and Electronics Engineers Inc.
页	27-31
页数	5
ISBN（电子版）	9781665454599
DOI	https://doi.org/10.1109/MLCR57210.2022.00014
出版状态	已出版 - 2022
活动	2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022 - Suzhou, 中国期限: 29 10月 2022 → 31 10月 2022

出版系列

姓名	Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022

会议

会议	2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022
国家/地区	中国
市	Suzhou
时期	29/10/22 → 31/10/22

访问文件

10.1109/MLCR57210.2022.00014

其它文件与链接

链接到 Scopus 的出版物

引用此

Qi, A., Xie, X., Zhan, Q., Hu, C., & Su, X. (2022). End-to-end Oriental Language Speech Recognition with Integrated Language Identification. 在 Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022 (页码 27-31). (Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/MLCR57210.2022.00014

Qi, Anbin ; Xie, Xiang ; Zhan, Qingran 等. / End-to-end Oriental Language Speech Recognition with Integrated Language Identification. Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 27-31 (Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022).

@inproceedings{1f3061f8b7be46d6b312cb0de8fdb24f,

title = "End-to-end Oriental Language Speech Recognition with Integrated Language Identification",

abstract = "In recent years, with the rise of human-computer interaction and the successful application of end-to-end models in the field of speech recognition, the construction of end-to-end speech recognition models has received extensive attention. Relying on the multi-task learning method and the connection between language identification and speech recognition, we proposed an end-to-end Transformer model, which is a multilingual speech recognition model integrating language identification. The model takes the speech recognition task as the main task and the language identification task as the auxiliary task. In this paper, the validity of the model is verified by using the datasets of 13 languages in the 2021 Oriental Language Recognition challenge (OLR). The experimental results show that the model constructed in this paper has a relative improvement of 37.46% in the speech recognition task compared with the baseline system proposed by the OLR organizer. The accuracy of language identification reaches 89.70 %. The results can get the fifth place in the 2021 OLR constraint track of speech recognition equally.",

keywords = "End-to-end, Language Identification, Multi-task learning, Oriental Languages, Speech Recognition",

author = "Anbin Qi and Xiang Xie and Qingran Zhan and Chenguang Hu and Xinmei Su",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022 ; Conference date: 29-10-2022 Through 31-10-2022",

year = "2022",

doi = "10.1109/MLCR57210.2022.00014",

language = "English",

series = "Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "27--31",

booktitle = "Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022",

address = "United States",

}

Qi, A, Xie, X, Zhan, Q, Hu, C & Su, X 2022, End-to-end Oriental Language Speech Recognition with Integrated Language Identification. 在 Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022. Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022, Institute of Electrical and Electronics Engineers Inc., 页码 27-31, 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022, Suzhou, 中国, 29/10/22. https://doi.org/10.1109/MLCR57210.2022.00014

End-to-end Oriental Language Speech Recognition with Integrated Language Identification. / Qi, Anbin; Xie, Xiang; Zhan, Qingran 等.
Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022. Institute of Electrical and Electronics Engineers Inc., 2022. 页码 27-31 (Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - End-to-end Oriental Language Speech Recognition with Integrated Language Identification

AU - Qi, Anbin

AU - Xie, Xiang

AU - Zhan, Qingran

AU - Hu, Chenguang

AU - Su, Xinmei

PY - 2022

Y1 - 2022

N2 - In recent years, with the rise of human-computer interaction and the successful application of end-to-end models in the field of speech recognition, the construction of end-to-end speech recognition models has received extensive attention. Relying on the multi-task learning method and the connection between language identification and speech recognition, we proposed an end-to-end Transformer model, which is a multilingual speech recognition model integrating language identification. The model takes the speech recognition task as the main task and the language identification task as the auxiliary task. In this paper, the validity of the model is verified by using the datasets of 13 languages in the 2021 Oriental Language Recognition challenge (OLR). The experimental results show that the model constructed in this paper has a relative improvement of 37.46% in the speech recognition task compared with the baseline system proposed by the OLR organizer. The accuracy of language identification reaches 89.70 %. The results can get the fifth place in the 2021 OLR constraint track of speech recognition equally.

AB - In recent years, with the rise of human-computer interaction and the successful application of end-to-end models in the field of speech recognition, the construction of end-to-end speech recognition models has received extensive attention. Relying on the multi-task learning method and the connection between language identification and speech recognition, we proposed an end-to-end Transformer model, which is a multilingual speech recognition model integrating language identification. The model takes the speech recognition task as the main task and the language identification task as the auxiliary task. In this paper, the validity of the model is verified by using the datasets of 13 languages in the 2021 Oriental Language Recognition challenge (OLR). The experimental results show that the model constructed in this paper has a relative improvement of 37.46% in the speech recognition task compared with the baseline system proposed by the OLR organizer. The accuracy of language identification reaches 89.70 %. The results can get the fifth place in the 2021 OLR constraint track of speech recognition equally.

KW - End-to-end

KW - Language Identification

KW - Multi-task learning

KW - Oriental Languages

KW - Speech Recognition

UR - http://www.scopus.com/inward/record.url?scp=85148629974&partnerID=8YFLogxK

U2 - 10.1109/MLCR57210.2022.00014

DO - 10.1109/MLCR57210.2022.00014

M3 - Conference contribution

AN - SCOPUS:85148629974

T3 - Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022

SP - 27

EP - 31

BT - Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022

Y2 - 29 October 2022 through 31 October 2022

ER -

Qi A, Xie X, Zhan Q, Hu C, Su X. End-to-end Oriental Language Speech Recognition with Integrated Language Identification. 在 Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022. Institute of Electrical and Electronics Engineers Inc. 2022. 页码 27-31. (Proceedings - 2022 International Conference on Machine Learning, Control, and Robotics, MLCR 2022). doi: 10.1109/MLCR57210.2022.00014

End-to-end Oriental Language Speech Recognition with Integrated Language Identification

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此