Computational auditory scene analysis based voice activity detection

Ming Tu, Xiang Xie, Xingyu Na

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Voice activity detection (VAD) is always important in many speech applications. In this paper, two VAD methods using novel features based on computational auditory scene analysis (CASA) are proposed. The first method is based on statistical model based VAD. Cochlea gram instead of discrete fourier transform coefficients is used as time-frequency representation to do statistical model based VAD. The second is a supervised method based on Gaussian Mixture Model. We extract gamma tone frequency cepstral coefficients (GFCC) from cochlea gram and use this feature to discriminate speech and noise in noisy signal. Gaussian mixture model is used to model GFCC of speech and noise. We evaluate the two methods both in the framework of multiple observation likelihood ratio test. The performances of the two methods are compared with several existing algorithms. The results demonstrate that CASA based features outperform several traditional features in the task of VAD, and the reasons of the superiority of the proposed two features are also investigated.

Original languageEnglish
Title of host publicationProceedings - International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages797-802
Number of pages6
ISBN (Electronic)9781479952083
DOIs
Publication statusPublished - 4 Dec 2014
Event22nd International Conference on Pattern Recognition, ICPR 2014 - Stockholm, Sweden
Duration: 24 Aug 201428 Aug 2014

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Conference

Conference22nd International Conference on Pattern Recognition, ICPR 2014
Country/TerritorySweden
CityStockholm
Period24/08/1428/08/14

Fingerprint

Dive into the research topics of 'Computational auditory scene analysis based voice activity detection'. Together they form a unique fingerprint.

Cite this