TY - GEN
T1 - Computational auditory scene analysis based voice activity detection
AU - Tu, Ming
AU - Xie, Xiang
AU - Na, Xingyu
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/4
Y1 - 2014/12/4
N2 - Voice activity detection (VAD) is always important in many speech applications. In this paper, two VAD methods using novel features based on computational auditory scene analysis (CASA) are proposed. The first method is based on statistical model based VAD. Cochlea gram instead of discrete fourier transform coefficients is used as time-frequency representation to do statistical model based VAD. The second is a supervised method based on Gaussian Mixture Model. We extract gamma tone frequency cepstral coefficients (GFCC) from cochlea gram and use this feature to discriminate speech and noise in noisy signal. Gaussian mixture model is used to model GFCC of speech and noise. We evaluate the two methods both in the framework of multiple observation likelihood ratio test. The performances of the two methods are compared with several existing algorithms. The results demonstrate that CASA based features outperform several traditional features in the task of VAD, and the reasons of the superiority of the proposed two features are also investigated.
AB - Voice activity detection (VAD) is always important in many speech applications. In this paper, two VAD methods using novel features based on computational auditory scene analysis (CASA) are proposed. The first method is based on statistical model based VAD. Cochlea gram instead of discrete fourier transform coefficients is used as time-frequency representation to do statistical model based VAD. The second is a supervised method based on Gaussian Mixture Model. We extract gamma tone frequency cepstral coefficients (GFCC) from cochlea gram and use this feature to discriminate speech and noise in noisy signal. Gaussian mixture model is used to model GFCC of speech and noise. We evaluate the two methods both in the framework of multiple observation likelihood ratio test. The performances of the two methods are compared with several existing algorithms. The results demonstrate that CASA based features outperform several traditional features in the task of VAD, and the reasons of the superiority of the proposed two features are also investigated.
UR - http://www.scopus.com/inward/record.url?scp=84919882896&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2014.147
DO - 10.1109/ICPR.2014.147
M3 - Conference contribution
AN - SCOPUS:84919882896
T3 - Proceedings - International Conference on Pattern Recognition
SP - 797
EP - 802
BT - Proceedings - International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd International Conference on Pattern Recognition, ICPR 2014
Y2 - 24 August 2014 through 28 August 2014
ER -