DataShift: A Cross-Modal Data Augmentation Method for Speech Recognition and Machine Translation

Haodong Cheng, Yuhang Guo*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Data augmentation has been successful in the tasks of different modalities such as speech and text. In this paper, we present a cross-modal data augmentation method, DataShift, to improve the performance of automatic speech recognition (ASR) and machine translation (MT) by randomly shifting values of the feature sequence along the time or frequency dimensions respectively. Experimental results show that our data augmentation method can improve the performance by 4% of word error rate (WER) and 0.36 BLEU score on average on the ASR and MT datasets separately.

源语言英语
主期刊名Proceedings - 2022 4th International Conference on Natural Language Processing, ICNLP 2022
出版商Institute of Electrical and Electronics Engineers Inc.
341-344
页数4
ISBN(电子版)9781665495448
DOI
出版状态已出版 - 2022
活动4th International Conference on Natural Language Processing, ICNLP 2022 - Xi�an, 中国
期限: 25 3月 202227 3月 2022

出版系列

姓名Proceedings - 2022 4th International Conference on Natural Language Processing, ICNLP 2022

会议

会议4th International Conference on Natural Language Processing, ICNLP 2022
国家/地区中国
Xi�an
时期25/03/2227/03/22

指纹

探究 'DataShift: A Cross-Modal Data Augmentation Method for Speech Recognition and Machine Translation' 的科研主题。它们共同构成独一无二的指纹。

引用此