Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition

Hui Yin*, Xiang Xie, Jingming Kuang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

It is well known that auditory system of human beings has excellent performance with which automatic speech recognition (ASR) systems can't match, and fractional Fourier transform (FrFT) has unique advantages in nonstationary signal processing. In this paper, the Gammatone filterbank is applied to speech signals for front-end temporal filtering, and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. The transform order is critical for FrFT. An order adaptation method based on the instantaneous frequency is proposed, and its performance is compared with the method based on ambiguity function. ASR experiments are conducted on clean and noisy Mandarin digits, and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline, and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function. Further more, the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.

Original languageEnglish
Pages (from-to)97-103
Number of pages7
JournalShengxue Xuebao/Acta Acustica
Volume37
Issue number1
Publication statusPublished - Jan 2012

Fingerprint

Dive into the research topics of 'Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition'. Together they form a unique fingerprint.

Cite this