End-to-End Speech Keyword Spotting System

Shenghua Hu*, Hanyue Liu, Liang Xu, Jing Wang, Yujun Wang, Peng Gao, Weiji Zhuang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

Original languageEnglish
Title of host publicationICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9798350312492
Publication statusPublished - 2023
Event2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 - Xi'an, China
Duration: 20 Oct 202323 Oct 2023

Publication series

NameICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence


Conference2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023


  • End-to-end
  • intelligent cabin
  • speech keyword spotting


Dive into the research topics of 'End-to-End Speech Keyword Spotting System'. Together they form a unique fingerprint.

Cite this