An improvement of the degradation of speaker recognition in continuous cold speech for home assistant

Haojun Ai; Yifeng Wang; Yuhong Yang; Quanxin Zhang

doi:10.1007/978-3-030-37337-5_29

An improvement of the degradation of speaker recognition in continuous cold speech for home assistant

Haojun Ai, Yifeng Wang, Yuhong Yang^*, Quanxin Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.

Original language	English
Title of host publication	Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings
Editors	Jaideep Vaidya, Xiao Zhang, Jin Li
Publisher	Springer
Pages	363-373
Number of pages	11
ISBN (Print)	9783030373368
DOIs	https://doi.org/10.1007/978-3-030-37337-5_29
Publication status	Published - 2019
Event	11th International Symposium on Cyberspace Safety and Security, CSS 2019 - Guangzhou, China Duration: 1 Dec 2019 → 3 Dec 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11982 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	11th International Symposium on Cyberspace Safety and Security, CSS 2019
Country/Territory	China
City	Guangzhou
Period	1/12/19 → 3/12/19

Keywords

Cold
Database
GMM
MFCC
Speaker recognition

Access to Document

10.1007/978-3-030-37337-5_29

Cite this

Ai, H., Wang, Y., Yang, Y., & Zhang, Q. (2019). An improvement of the degradation of speaker recognition in continuous cold speech for home assistant. In J. Vaidya, X. Zhang, & J. Li (Eds.), Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings (pp. 363-373). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11982 LNCS). Springer. https://doi.org/10.1007/978-3-030-37337-5_29

Ai, Haojun ; Wang, Yifeng ; Yang, Yuhong et al. / An improvement of the degradation of speaker recognition in continuous cold speech for home assistant. Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings. editor / Jaideep Vaidya ; Xiao Zhang ; Jin Li. Springer, 2019. pp. 363-373 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{eb35bdbbbe594adca5e08fb611ae08ec,

title = "An improvement of the degradation of speaker recognition in continuous cold speech for home assistant",

abstract = "Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user{\textquoteright}s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users{\textquoteright} state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.",

keywords = "Cold, Database, GMM, MFCC, Speaker recognition",

author = "Haojun Ai and Yifeng Wang and Yuhong Yang and Quanxin Zhang",

note = "Publisher Copyright: {\textcopyright} 2019, Springer Nature Switzerland AG.; 11th International Symposium on Cyberspace Safety and Security, CSS 2019 ; Conference date: 01-12-2019 Through 03-12-2019",

year = "2019",

doi = "10.1007/978-3-030-37337-5_29",

language = "English",

isbn = "9783030373368",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "363--373",

editor = "Jaideep Vaidya and Xiao Zhang and Jin Li",

booktitle = "Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings",

address = "Germany",

}

Ai, H, Wang, Y, Yang, Y & Zhang, Q 2019, An improvement of the degradation of speaker recognition in continuous cold speech for home assistant. in J Vaidya, X Zhang & J Li (eds), Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11982 LNCS, Springer, pp. 363-373, 11th International Symposium on Cyberspace Safety and Security, CSS 2019, Guangzhou, China, 1/12/19. https://doi.org/10.1007/978-3-030-37337-5_29

An improvement of the degradation of speaker recognition in continuous cold speech for home assistant. / Ai, Haojun; Wang, Yifeng; Yang, Yuhong et al.
Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings. ed. / Jaideep Vaidya; Xiao Zhang; Jin Li. Springer, 2019. p. 363-373 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11982 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - An improvement of the degradation of speaker recognition in continuous cold speech for home assistant

AU - Ai, Haojun

AU - Wang, Yifeng

AU - Yang, Yuhong

AU - Zhang, Quanxin

PY - 2019

Y1 - 2019

N2 - Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.

AB - Home assistant with speech user interfaces is quite welcomed due to its convenience in recent years. With speaker recognition (SR) technology in this application, personalized services (e.g., playing music, making to-do lists) for different family members become reality. However, the SR accuracy may decline sharply when a family has a cold due to the restriction of hardware and response time. In this paper, we propose a dual model updating strategy based on cold detection to maintain all speaker voice models. In this method, time domain and frequency domain features would be combined to detect continuous cold speech. And then, corresponding models would be selected to determine the identity according to the results of the detection. In order to continuously track SR performance based on data of mobile phone usage, a new mobile phone-based speech dataset (PBSD) which contains voice, phone model, and user’s state of physical wellness has been constructed. Besides, the relationship between SR accuracy and users’ state of physical wellness also has been analyzed based on a GMM-UBM framework. Finally, to evaluate performance of the proposed method, experiments focused on SR accuracy of 10 speakers from both cold-suffering and healthy states have been conducted. The results demonstrated that the SR accuracy can be improved effectively by the cold detection-based model updating strategy, especially in a cold-suffering circumstance.

KW - Cold

KW - Database

KW - GMM

KW - MFCC

KW - Speaker recognition

UR - http://www.scopus.com/inward/record.url?scp=85078543055&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-37337-5_29

DO - 10.1007/978-3-030-37337-5_29

M3 - Conference contribution

AN - SCOPUS:85078543055

SN - 9783030373368

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 363

EP - 373

BT - Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings

A2 - Vaidya, Jaideep

A2 - Zhang, Xiao

A2 - Li, Jin

PB - Springer

T2 - 11th International Symposium on Cyberspace Safety and Security, CSS 2019

Y2 - 1 December 2019 through 3 December 2019

ER -

Ai H, Wang Y, Yang Y, Zhang Q. An improvement of the degradation of speaker recognition in continuous cold speech for home assistant. In Vaidya J, Zhang X, Li J, editors, Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings. Springer. 2019. p. 363-373. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-37337-5_29

An improvement of the degradation of speaker recognition in continuous cold speech for home assistant

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this