TY - GEN
T1 - End-to-End Speech Keyword Spotting System
AU - Hu, Shenghua
AU - Liu, Hanyue
AU - Xu, Liang
AU - Wang, Jing
AU - Wang, Yujun
AU - Gao, Peng
AU - Zhuang, Weiji
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.
AB - The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.
KW - End-to-end
KW - intelligent cabin
KW - speech keyword spotting
UR - http://www.scopus.com/inward/record.url?scp=85179009754&partnerID=8YFLogxK
U2 - 10.1109/ICCSI58851.2023.10304048
DO - 10.1109/ICCSI58851.2023.10304048
M3 - Conference contribution
AN - SCOPUS:85179009754
T3 - ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
SP - 215
EP - 220
BT - ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023
Y2 - 20 October 2023 through 23 October 2023
ER -