Variable rate characteristic waveform interpolation speech coder based on phonetic classification

Jing Wang; Jing Ming Kuang; Sheng Hui Zhao

Variable rate characteristic waveform interpolation speech coder based on phonetic classification

Jing Wang^*, Jing Ming Kuang, Sheng Hui Zhao

^*Corresponding author for this work

School of Information and Electronics

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed. Each input frame is classified into one of 4 phonetic classes. Non-speech frames are represented with Bark-band noise model. The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI. Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1016 CELP at 4.8 kbit/s and similar to G. 723.1 ACELP at 5.3 kbit/s.

Original language	English
Pages (from-to)	187-192
Number of pages	6
Journal	Journal of Beijing Institute of Technology (English Edition)
Volume	16
Issue number	2
Publication status	Published - Jun 2007

Keywords

Characteristic waveform interpolation
Phonetic classification
Variable bit rate speech coding

Cite this

@article{cc7ded65be0e4796a8892a941e221145,

title = "Variable rate characteristic waveform interpolation speech coder based on phonetic classification",

abstract = "A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed. Each input frame is classified into one of 4 phonetic classes. Non-speech frames are represented with Bark-band noise model. The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI. Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1016 CELP at 4.8 kbit/s and similar to G. 723.1 ACELP at 5.3 kbit/s.",

keywords = "Characteristic waveform interpolation, Phonetic classification, Variable bit rate speech coding",

author = "Jing Wang and Kuang, {Jing Ming} and Zhao, {Sheng Hui}",

year = "2007",

month = jun,

language = "English",

volume = "16",

pages = "187--192",

journal = "Journal of Beijing Institute of Technology (English Edition)",

issn = "1004-0579",

publisher = "Beijing Institute of Technology",

number = "2",

}

TY - JOUR

T1 - Variable rate characteristic waveform interpolation speech coder based on phonetic classification

AU - Wang, Jing

AU - Kuang, Jing Ming

AU - Zhao, Sheng Hui

PY - 2007/6

Y1 - 2007/6

N2 - A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed. Each input frame is classified into one of 4 phonetic classes. Non-speech frames are represented with Bark-band noise model. The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI. Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1016 CELP at 4.8 kbit/s and similar to G. 723.1 ACELP at 5.3 kbit/s.

AB - A variable-bit-rate characteristic waveform interpolation (VBR-CWI) speech codec with about 1.8 kbit/s average bit rate which integrates phonetic classification into characteristic waveform (CW) decomposition is proposed. Each input frame is classified into one of 4 phonetic classes. Non-speech frames are represented with Bark-band noise model. The extracted CWs become rapidly evolving waveforms (REWs) or slowly evolving waveforms (SEWs) in the cases of unvoiced or stationary voiced frames respectively, while mixed voiced frames use the same CW decomposition as that in the conventional CWI. Experimental results show that the proposed codec can eliminate most buzzy and noisy artifacts existing in the fixed-bit-rate characteristic waveform interpolation (FBR-CWI) speech codec, the average bit rate can be much lower, and its reconstructed speech quality is much better than FS 1016 CELP at 4.8 kbit/s and similar to G. 723.1 ACELP at 5.3 kbit/s.

KW - Characteristic waveform interpolation

KW - Phonetic classification

KW - Variable bit rate speech coding

UR - http://www.scopus.com/inward/record.url?scp=34447345113&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:34447345113

SN - 1004-0579

VL - 16

SP - 187

EP - 192

JO - Journal of Beijing Institute of Technology (English Edition)

JF - Journal of Beijing Institute of Technology (English Edition)

IS - 2

ER -

Variable rate characteristic waveform interpolation speech coder based on phonetic classification

Abstract

Keywords

Other files and links

Fingerprint

Cite this