TY - GEN
T1 - Audio Replay Spoof Attack Detection Using A GMM-RFPNN Model as Back-end Classifier
AU - Qi, Kaikai
AU - Huang, Wei
AU - Wang, Dan
AU - Zhang, Honghao
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Research on automatic speaker verification (ASV) techniques has received academic attention in recent years and has begun to be applied to authentication, but research on the security performance of ASV is just beginning. In this paper, we will focus on speech replay spoofing attack detection in speaker authentication techniques. Voice is a biological behavioral feature with high inter-class variability and susceptibility to environmental and temporal influences. In this paper, classical constant Q cepstral coefficient features (CQCC) and Gaussian super-vectors are used as front-end feature extractors and fuzzy polynomial neural network (FPNN) models with regularization processing are used as back-end classifiers for true and false speech detection. Compared with other traditional machine learning models and deep learning models, this model shows stronger robustness and generalization ability on acoustic environment and time variation, and good detection results can be obtained using a small number of samples for training. Tested on the ASV spoof 2017 version 2.0 database, the detection performance is improved by about 39% compared to the original baseline system.
AB - Research on automatic speaker verification (ASV) techniques has received academic attention in recent years and has begun to be applied to authentication, but research on the security performance of ASV is just beginning. In this paper, we will focus on speech replay spoofing attack detection in speaker authentication techniques. Voice is a biological behavioral feature with high inter-class variability and susceptibility to environmental and temporal influences. In this paper, classical constant Q cepstral coefficient features (CQCC) and Gaussian super-vectors are used as front-end feature extractors and fuzzy polynomial neural network (FPNN) models with regularization processing are used as back-end classifiers for true and false speech detection. Compared with other traditional machine learning models and deep learning models, this model shows stronger robustness and generalization ability on acoustic environment and time variation, and good detection results can be obtained using a small number of samples for training. Tested on the ASV spoof 2017 version 2.0 database, the detection performance is improved by about 39% compared to the original baseline system.
KW - Gaussian mixture models
KW - automatic speaker verification
KW - replay speech detection
UR - https://www.scopus.com/pages/publications/85133979115
U2 - 10.1109/ICAICE54393.2021.00089
DO - 10.1109/ICAICE54393.2021.00089
M3 - Conference contribution
AN - SCOPUS:85133979115
T3 - Proceedings - 2021 2nd International Conference on Artificial Intelligence and Computer Engineering, ICAICE 2021
SP - 425
EP - 429
BT - Proceedings - 2021 2nd International Conference on Artificial Intelligence and Computer Engineering, ICAICE 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Artificial Intelligence and Computer Engineering, ICAICE 2021
Y2 - 5 November 2021 through 7 November 2021
ER -