Separable Temporal Convolution plus Temporally Pooled Attention for Lightweight High-Performance Keyword Spotting

Shenghua Hu, Jing Wang, Yujun Wang, Wenjing Yang

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. In this paper, we propose a temporally pooled attention module which can capture global features better than the AveragePool. Besides, we design a separable temporal convolution network which leverages depthwise separable and temporal convolution to reduce the number of parameter and calculations. Finally, taking advantage of separable temporal convolution and temporally pooled attention, a efficient neural network (ST -AttNet) is designed for KWS system. We evaluate the models on the publicly available Google speech commands data sets VI. The number of parameters of proposed model (48K) is 1/6 of state-of-the-art TC-ResNet14-1.5 model (30SK). The proposed model achieves a 96.6% accuracy, which is comparable to the TC-ResNet14-1.5 model (96.60%).

源语言英语
主期刊名2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
1072-1076
页数5
ISBN(电子版)9789881476890
出版状态已出版 - 2021
活动2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, 日本
期限: 14 12月 202117 12月 2021

出版系列

姓名2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

会议

会议2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
国家/地区日本
Tokyo
时期14/12/2117/12/21

指纹

探究 'Separable Temporal Convolution plus Temporally Pooled Attention for Lightweight High-Performance Keyword Spotting' 的科研主题。它们共同构成独一无二的指纹。

引用此