A Modified Speaking Rate Estimation Based on Frame-Level LSTM

Yanhong Xiao, Shixuan Du, Xiang Xie, Jing Wang, Qingran Zhan

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Speaking rate has various applications in many domains such as speech recognition, speaker verification, emotion recognition, etc. It conveys long-term information in speech and changes over time which can be seen as a kind of time sequence. This paper proposes a frame-level LSTM speaking rate estimation method. Instead of taking the whole utterance as a sequence, the frame-level LSTM exploits the sequence information in each segment and brings a more precise segmented speaking rate estimation. We also evaluate the influence of fixed-length segmentation and voice activity detection(vad) segmentation on speaking rate estimation. Results show that the proposed frame-level LSTM method yields a high correlation between the estimated speaking rate and the ground truth. It achieves a relative improvement of 13.0% compared to the state of the art statistical learning method and 16.3% over the support vector regression(SVR) evaluated on the same TIMIT corpus.

源语言英语
主期刊名2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018
编辑Yuan Baozong, Ruan Qiuqi, Zhao Yao, An Gaoyun
出版商Institute of Electrical and Electronics Engineers Inc.
600-603
页数4
ISBN(电子版)9781538646724
DOI
出版状态已出版 - 2 2月 2019
活动14th IEEE International Conference on Signal Processing, ICSP 2018 - Beijing, 中国
期限: 12 8月 201816 8月 2018

出版系列

姓名International Conference on Signal Processing Proceedings, ICSP
2018-August

会议

会议14th IEEE International Conference on Signal Processing, ICSP 2018
国家/地区中国
Beijing
时期12/08/1816/08/18

指纹

探究 'A Modified Speaking Rate Estimation Based on Frame-Level LSTM' 的科研主题。它们共同构成独一无二的指纹。

引用此