End-to-End Speech Keyword Spotting System

Shenghua Hu*, Hanyue Liu, Liang Xu, Jing Wang, Yujun Wang, Peng Gao, Weiji Zhuang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

源语言英语
主期刊名ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
出版商Institute of Electrical and Electronics Engineers Inc.
215-220
页数6
ISBN(电子版)9798350312492
DOI
出版状态已出版 - 2023
活动2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 - Xi'an, 中国
期限: 20 10月 202323 10月 2023

出版系列

姓名ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

会议

会议2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023
国家/地区中国
Xi'an
时期20/10/2323/10/23

指纹

探究 'End-to-End Speech Keyword Spotting System' 的科研主题。它们共同构成独一无二的指纹。

引用此