End-to-End Speech Keyword Spotting System

Shenghua Hu; Hanyue Liu; Liang Xu; Jing Wang; Yujun Wang; Peng Gao; Weiji Zhuang

doi:10.1109/ICCSI58851.2023.10304048

End-to-End Speech Keyword Spotting System

Shenghua Hu^*, Hanyue Liu, Liang Xu, Jing Wang, Yujun Wang, Peng Gao, Weiji Zhuang

^*此作品的通讯作者

信息与电子学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

源语言	英语
主期刊名	ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence
出版商	Institute of Electrical and Electronics Engineers Inc.
页	215-220
页数	6
ISBN（电子版）	9798350312492
DOI	https://doi.org/10.1109/ICCSI58851.2023.10304048
出版状态	已出版 - 2023
活动	2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 - Xi'an, 中国期限: 20 10月 2023 → 23 10月 2023

出版系列

姓名	ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

会议

会议	2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023
国家/地区	中国
市	Xi'an
时期	20/10/23 → 23/10/23

访问文件

10.1109/ICCSI58851.2023.10304048

其它文件与链接

链接到 Scopus 的出版物

引用此

Hu, S., Liu, H., Xu, L., Wang, J., Wang, Y., Gao, P., & Zhuang, W. (2023). End-to-End Speech Keyword Spotting System. 在 ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence (页码 215-220). (ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCSI58851.2023.10304048

@inproceedings{2d7d6cfc08cc47ef8f49f9fd0c0eb821,

title = "End-to-End Speech Keyword Spotting System",

abstract = "The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.",

keywords = "End-to-end, intelligent cabin, speech keyword spotting",

author = "Shenghua Hu and Hanyue Liu and Liang Xu and Jing Wang and Yujun Wang and Peng Gao and Weiji Zhuang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023 ; Conference date: 20-10-2023 Through 23-10-2023",

year = "2023",

doi = "10.1109/ICCSI58851.2023.10304048",

language = "English",

series = "ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "215--220",

booktitle = "ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence",

address = "United States",

}

Hu, S, Liu, H, Xu, L, Wang, J, Wang, Y, Gao, P & Zhuang, W 2023, End-to-End Speech Keyword Spotting System. 在 ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence. ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence, Institute of Electrical and Electronics Engineers Inc., 页码 215-220, 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023, Xi'an, 中国, 20/10/23. https://doi.org/10.1109/ICCSI58851.2023.10304048

End-to-End Speech Keyword Spotting System. / Hu, Shenghua; Liu, Hanyue; Xu, Liang 等.
ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 215-220 (ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - End-to-End Speech Keyword Spotting System

AU - Hu, Shenghua

AU - Liu, Hanyue

AU - Xu, Liang

AU - Wang, Jing

AU - Wang, Yujun

AU - Gao, Peng

AU - Zhuang, Weiji

PY - 2023

Y1 - 2023

N2 - The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

AB - The purpose of speech keyword spotting is to detect a set of predefined keywords from a continuous speech signal stream. Based on the research on end-to-end technologies in the field of deep learning, this paper designs and implements an end-to-end speech keyword spotting algorithm, which has a wide range of applications in various fields, such as smartphones and automobiles. The algorithm first trains an acoustic model based on a deep neural network, which receives the acoustic features and outputs the posterior probability of the wake-up word. Then, the posterior probability is smoothed to obtain the confidence score of the wake-up word. Through the above process, the traditional decoding process can be avoided effectively. In addition, this paper compares various neural network structures of acoustic model, such as the time-delay neural network (TDNN) and the factorized time-delay neural network (TDNN-F). Through comparative experiments by controlling variables, it is verified that the proposed end-to-end speech keyword spotting algorithm has competitive performance compared with the other popular technologies.

KW - End-to-end

KW - intelligent cabin

KW - speech keyword spotting

UR - http://www.scopus.com/inward/record.url?scp=85179009754&partnerID=8YFLogxK

U2 - 10.1109/ICCSI58851.2023.10304048

DO - 10.1109/ICCSI58851.2023.10304048

M3 - Conference contribution

AN - SCOPUS:85179009754

T3 - ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

SP - 215

EP - 220

BT - ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 International Conference on Cyber-Physical Social Intelligence, ICCSI 2023

Y2 - 20 October 2023 through 23 October 2023

ER -

Hu S, Liu H, Xu L, Wang J, Wang Y, Gao P 等. End-to-End Speech Keyword Spotting System. 在 ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 215-220. (ICCSI 2023 - 2023 International Conference on Cyber-Physical Social Intelligence). doi: 10.1109/ICCSI58851.2023.10304048

End-to-End Speech Keyword Spotting System

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此