Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting

  • Alexander Rogath Kivaisi
  • , Qingjie Zhao*
  • , Yuanbing Zou
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Customized keyword spotting needs to adapt quickly to small user samples. Current methods primarily solve the problem under moderate noise conditions. Recent work increases the level of difficulty in detecting keywords by introducing keyword interference. However, the current solution has been explored on large models with many parameters, making it unsuitable for deployment on small devices. When applying the current solution to lightweight models with minimal training data, the performance degrades compared to the baseline model. Therefore, we propose a light-weight multi-task architecture (<9.0×104parameters) created from integrating the triplet attention module in the ConvMixer networks and a new auxiliary mixed labeling encoding to address the challenge. The results of our experiment show that the proposed model outperforms similar light-weight models for keyword spotting, with accuracy gains ranging from 0.73% to 2.95% for a clean set and from 2.01% to 3.37% for a mixed set under different scales of training set. Furthermore, our model shows its robustness in different low-resource language datasets while converging faster.

Original languageEnglish
Pages (from-to)875-893
Number of pages19
JournalTsinghua Science and Technology
Volume30
Issue number2
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Spotting (KWS)
  • cross-dimension attention
  • low-resource
  • mixed speech
  • multi-task learning

Fingerprint

Dive into the research topics of 'Multi-Task ConvMixer Networks with Triplet Attention for Low-Resource Keyword Spotting'. Together they form a unique fingerprint.

Cite this