Abstract
The mixed excitation source-filter model is used in most statistical parametric speech synthesis systems, so voice aperiodicity is a crucial factor for synthesis voice quality perception. One problem is to improve the precision of the aperiodicity model, while another is that the aperiodicity model must be compressed for the embedded speech synthesis system. The voice aperiodicity of one frame is shown to be related to that of other frames in the time scale of one syllable. The band voice aperiodicity contours for one syllable are fitted by a discrete cosine transformation (DCT). Tests show that the band aperiodicity (BAP) model can be compressed to 6.64% of the baseline system while providing nearly the same perception quality of the synthesized speech.
| Original language | English |
|---|---|
| Pages (from-to) | 767-770+780 |
| Journal | Qinghua Daxue Xuebao/Journal of Tsinghua University |
| Volume | 53 |
| Issue number | 6 |
| Publication status | Published - 2013 |
Keywords
- Aperiodicity contour
- Speech synthesis
- Syllable-level modeling
Fingerprint
Dive into the research topics of 'Syllable-level modeling of voice aperiodicity contours for embedded Mandarin speech synthesis systems'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver