Skip to main navigation Skip to search Skip to main content

Syllable-level modeling of voice aperiodicity contours for embedded Mandarin speech synthesis systems

  • Chaomin Wang
  • , Xiang Xie*
  • , Jingming Kuang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The mixed excitation source-filter model is used in most statistical parametric speech synthesis systems, so voice aperiodicity is a crucial factor for synthesis voice quality perception. One problem is to improve the precision of the aperiodicity model, while another is that the aperiodicity model must be compressed for the embedded speech synthesis system. The voice aperiodicity of one frame is shown to be related to that of other frames in the time scale of one syllable. The band voice aperiodicity contours for one syllable are fitted by a discrete cosine transformation (DCT). Tests show that the band aperiodicity (BAP) model can be compressed to 6.64% of the baseline system while providing nearly the same perception quality of the synthesized speech.

Original languageEnglish
Pages (from-to)767-770+780
JournalQinghua Daxue Xuebao/Journal of Tsinghua University
Volume53
Issue number6
Publication statusPublished - 2013

Keywords

  • Aperiodicity contour
  • Speech synthesis
  • Syllable-level modeling

Fingerprint

Dive into the research topics of 'Syllable-level modeling of voice aperiodicity contours for embedded Mandarin speech synthesis systems'. Together they form a unique fingerprint.

Cite this