End-to-End Speech Keyword Spotting System

Shenghua Hu; Hanyue Liu; Liang Xu; Jing Wang; Yujun Wang; Peng Gao; Weiji Zhuang

doi:10.1109/ICCSI58851.2023.10304048

End-to-End Speech Keyword Spotting System

Shenghua Hu^*, Hanyue Liu, Liang Xu, Jing Wang, Yujun Wang, Peng Gao, Weiji Zhuang

^*Corresponding author for this work

School of Information and Electronics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

Original language	English
Title of host publication	ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	215-220
Number of pages	6
ISBN (Electronic)	9798350312492
DOIs	https://doi.org/10.1109/ICCSI58851.2023.10304048
Publication status	Published - 2023
Event	2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 - Xi'an, China Duration: 20 Oct 2023 → 23 Oct 2023

Publication series

Name	ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

Conference

Conference	2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023
Country/Territory	China
City	Xi'an
Period	20/10/23 → 23/10/23

Keywords

End-to-end
intelligent cabin
speech keyword spotting

Access to Document

10.1109/ICCSI58851.2023.10304048

Cite this

Hu, S., Liu, H., Xu, L., Wang, J., Wang, Y., Gao, P., & Zhuang, W. (2023). End-to-End Speech Keyword Spotting System. In ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence (pp. 215-220). (ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCSI58851.2023.10304048

@inproceedings{2d7d6cfc08cc47ef8f49f9fd0c0eb821,

title = "End-to-End Speech Keyword Spotting System",

abstract = "The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.",

keywords = "End-to-end, intelligent cabin, speech keyword spotting",

author = "Shenghua Hu and Hanyue Liu and Liang Xu and Jing Wang and Yujun Wang and Peng Gao and Weiji Zhuang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 ; Conference date: 20-10-2023 Through 23-10-2023",

year = "2023",

doi = "10.1109/ICCSI58851.2023.10304048",

language = "English",

series = "ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "215--220",

booktitle = "ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence",

address = "United States",

}

Hu, S, Liu, H, Xu, L, Wang, J, Wang, Y, Gao, P & Zhuang, W 2023, End-to-End Speech Keyword Spotting System. in ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence. ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence, Institute of Electrical and Electronics Engineers Inc., pp. 215-220, 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023, Xi'an, China, 20/10/23. https://doi.org/10.1109/ICCSI58851.2023.10304048

End-to-End Speech Keyword Spotting System. / Hu, Shenghua; Liu, Hanyue; Xu, Liang et al.
ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence. Institute of Electrical and Electronics Engineers Inc., 2023. p. 215-220 (ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - End-to-End Speech Keyword Spotting System

AU - Hu, Shenghua

AU - Liu, Hanyue

AU - Xu, Liang

AU - Wang, Jing

AU - Wang, Yujun

AU - Gao, Peng

AU - Zhuang, Weiji

PY - 2023

Y1 - 2023

N2 - The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

AB - The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

KW - End-to-end

KW - intelligent cabin

KW - speech keyword spotting

UR - http://www.scopus.com/inward/record.url?scp=85179009754&partnerID=8YFLogxK

U2 - 10.1109/ICCSI58851.2023.10304048

DO - 10.1109/ICCSI58851.2023.10304048

M3 - Conference contribution

AN - SCOPUS:85179009754

T3 - ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

SP - 215

EP - 220

BT - ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023

Y2 - 20 October 2023 through 23 October 2023

ER -

Hu S, Liu H, Xu L, Wang J, Wang Y, Gao P et al. End-to-End Speech Keyword Spotting System. In ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence. Institute of Electrical and Electronics Engineers Inc. 2023. p. 215-220. (ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence). doi: 10.1109/ICCSI58851.2023.10304048

End-to-End Speech Keyword Spotting System

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this