DataShift: A Cross-Modal Data Augmentation Method for Speech Recognition and Machine Translation

Haodong Cheng, Yuhang Guo*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Data augmentation has been successful in the tasks of different modalities such as speech and text. In this paper, we present a cross-modal data augmentation method, DataShift, to improve the performance of automatic speech recognition (ASR) and machine translation (MT) by randomly shifting values of the feature sequence along the time or frequency dimensions respectively. Experimental results show that our data augmentation method can improve the performance by 4% of word error rate (WER) and 0.36 BLEU score on average on the ASR and MT datasets separately.

Original languageEnglish
Title of host publicationProceedings - 2022 4th International Conference on Natural Language Processing, ICNLP 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages341-344
Number of pages4
ISBN (Electronic)9781665495448
DOIs
Publication statusPublished - 2022
Event4th International Conference on Natural Language Processing, ICNLP 2022 - Xi�an, China
Duration: 25 Mar 202227 Mar 2022

Publication series

NameProceedings - 2022 4th International Conference on Natural Language Processing, ICNLP 2022

Conference

Conference4th International Conference on Natural Language Processing, ICNLP 2022
Country/TerritoryChina
CityXi�an
Period25/03/2227/03/22

Keywords

  • automatic speech recognition
  • data augmentation
  • machine translation

Fingerprint

Dive into the research topics of 'DataShift: A Cross-Modal Data Augmentation Method for Speech Recognition and Machine Translation'. Together they form a unique fingerprint.

Cite this