Abstract
In this paper, the acoustic characteristics and recognition of whispered speech are discussed. A Mandarin digits database is built both in normal speech and whispered speech. The collected speech materials of normal and whispered speech are analyzed to verify the characteristics and differences for the two kinds of speech. Cross recognition is carried out using normal and whispered speech as training data and testing data respectively, and the detailed recognition results are analyzed by using the confusion matrices. The results show that it's not suitable to recognize whispered speech using models trained by normal speech, and the word correct rate of the whispered speech is in close relation with its acoustic characteristics. Some possible solutions are also suggested.
Original language | English |
---|---|
Pages (from-to) | 1141-1144 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2008 |
Event | INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia Duration: 22 Sept 2008 → 26 Sept 2008 |
Keywords
- Confusion matrix
- Connected digits
- Speech recognition
- Whispered speech