Abstract
Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack such a resource that is vital for spoken digit recognition. For languages where such resources exist, they are usually insufficient. Thus, pre-training methods have been used with external resources to improve continuous speech recognition. However, to the best of our knowledge, no study has investigated the effect of pre-training methods specifically for spoken digit recognition. This study aimed at addressing these problems. First, we developed a Swahili spoken digit dataset for Swahili spoken digit recognition. Then, we investigated the effect of cross-lingual and multi-lingual pre-training methods on spoken digit recognition. Finally, we proposed an effective language-independent pre-training method for spoken digit recognition. The proposed method has the advantage of incorporating target language data during the pre-training stage that leads to an optimal solution when using less training data. Experiments on Swahili (being developed), English, and Gujarati datasets show that our method achieves better performance compared with all the baselines listed in this study.
| Original language | English |
|---|---|
| Article number | 3597494 |
| Journal | ACM Transactions on Asian and Low-Resource Language Information Processing |
| Volume | 22 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - 20 Jul 2023 |
Keywords
- Additional Key Words and PhrasesSwahili language
- convolutional neural network
- cross-lingual
- low-resource language
- multi-lingual
- pre-training
- spoken digit recognition
Fingerprint
Dive into the research topics of 'Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver