TY - GEN
T1 - An improvement of the degradation of speaker recognition in continuous cold speech for home assistant
AU - Ai, Haojun
AU - Wang, Yifeng
AU - Yang, Yuhong
AU - Zhang, Quanxin
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.
AB - Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.
KW - Cold
KW - Database
KW - GMM
KW - MFCC
KW - Speaker recognition
UR - http://www.scopus.com/inward/record.url?scp=85078543055&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-37337-5_29
DO - 10.1007/978-3-030-37337-5_29
M3 - Conference contribution
AN - SCOPUS:85078543055
SN - 9783030373368
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 363
EP - 373
BT - Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings
A2 - Vaidya, Jaideep
A2 - Zhang, Xiao
A2 - Li, Jin
PB - Springer
T2 - 11th International Symposium on Cyberspace Safety and Security, CSS 2019
Y2 - 1 December 2019 through 3 December 2019
ER -