TY - GEN
T1 - Separable Temporal Convolution plus Temporally Pooled Attention for Lightweight High-Performance Keyword Spotting
AU - Hu, Shenghua
AU - Wang, Jing
AU - Wang, Yujun
AU - Yang, Wenjing
N1 - Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. In this paper, we propose a temporally pooled attention module which can capture global features better than the AveragePool. Besides, we design a separable temporal convolution network which leverages depthwise separable and temporal convolution to reduce the number of parameter and calculations. Finally, taking advantage of separable temporal convolution and temporally pooled attention, a efficient neural network (ST -AttNet) is designed for KWS system. We evaluate the models on the publicly available Google speech commands data sets VI. The number of parameters of proposed model (48K) is 1/6 of state-of-the-art TC-ResNet14-1.5 model (30SK). The proposed model achieves a 96.6% accuracy, which is comparable to the TC-ResNet14-1.5 model (96.60%).
AB - Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. In this paper, we propose a temporally pooled attention module which can capture global features better than the AveragePool. Besides, we design a separable temporal convolution network which leverages depthwise separable and temporal convolution to reduce the number of parameter and calculations. Finally, taking advantage of separable temporal convolution and temporally pooled attention, a efficient neural network (ST -AttNet) is designed for KWS system. We evaluate the models on the publicly available Google speech commands data sets VI. The number of parameters of proposed model (48K) is 1/6 of state-of-the-art TC-ResNet14-1.5 model (30SK). The proposed model achieves a 96.6% accuracy, which is comparable to the TC-ResNet14-1.5 model (96.60%).
UR - http://www.scopus.com/inward/record.url?scp=85126687306&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126687306
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 1072
EP - 1076
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -