Improved mini-batch multiple augmentation for low-resource spoken word recognition

Alexander Rogath Kivaisi, Qingjie Zhao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Data augmentation techniques have been useful in dealing with limited data for machine learning tasks. Recently, spectrogram data augmentation techniques have been investigated for voice conversion and sound classification tasks and have produced better results. However, applying multiple data augmentation techniques within a mini-batch has been observed to lead to performance degradation. While applying multiple augmentation methods sequentially has shown performance gains in image data, transferring this approach to spectrogram data leads to loss of acoustic information. Hence, an alternative approach is needed to effectively utilize multiple augmentation methods in the speech domain. This study addressed these challenges in low-resource settings for spoken word recognition within the mini-batch. First, we investigated the effect of data augmentation techniques. Second, we investigated the effect of multiple data augmentation techniques. Finally, we proposed a new approach that uses an alternate mechanism to utilize multiple spectrogram augmentation techniques more effectively. The results of our experiment show that the proposed approach (new pattern) outperforms the sequential approach (traditional pattern) significantly at different scales of datasets, including low-resource settings. In addition, the proposed approach achieves approximately 2x actual speedup over the sequential approach. A combination of frequency-warping and time length control augmentation methods was found to be stable and robust in performance across all datasets evaluated.

Original languageEnglish
Article number124157
JournalExpert Systems with Applications
Volume252
DOIs
Publication statusPublished - 15 Oct 2024

Keywords

  • Low-resource languages
  • Mini-batch
  • Multiple augmentation
  • Spectrogram
  • Spoken word recognition

Fingerprint

Dive into the research topics of 'Improved mini-batch multiple augmentation for low-resource spoken word recognition'. Together they form a unique fingerprint.

Cite this