TY - JOUR
T1 - Frequency importance function of the speech intelligibility index for Mandarin Chinese
AU - Chen, Jing
AU - Huang, Qiang
AU - Wu, Xihong
N1 - Publisher Copyright:
© 2016
PY - 2016/10/1
Y1 - 2016/10/1
N2 - The speech intelligibility index (SII) is a widely used objective method of predicting speech intelligibility, in which the frequency importance function (FIF) is a key component. The FIF characterizes the relative contribution of different frequency bands to speech recognition. In this work, FIFs for Mandarin Chinese were derived for monosyllabic words spoken by male and female speakers. These words were phoneme balanced and selected from the word lists of a national standard, which have been used for measuring the articulation index in China since 1995. A pilot experiment was conducted to determine suitable signal-to-noise ratios (SNR) for measuring speech intelligibility. The main experiment was conducted to derive the FIFs using 288 test conditions (4 SNRs × 36 filtering conditions × 2 speaker genders). The noise was speech-spectrum shaped and it was generated separately for the male and female speech materials. The results show that, using 1/3 octave analysis bands: (1) The FIF averaged across genders has a peak in the frequency range between 1000 and 2500 Hz, which is consistent with the FIF for English monosyllabic words; (2) The frequency bands centered at 160, 1600, and 2000 Hz are slightly more important for Mandarin Chinese than for English; (3) Male speech is more intelligible than female speech, and the band centered at 160 Hz is more important for female than male speech. The FIF differences between Mandarin and English and the effect of speaker gender are analyzed and discussed.
AB - The speech intelligibility index (SII) is a widely used objective method of predicting speech intelligibility, in which the frequency importance function (FIF) is a key component. The FIF characterizes the relative contribution of different frequency bands to speech recognition. In this work, FIFs for Mandarin Chinese were derived for monosyllabic words spoken by male and female speakers. These words were phoneme balanced and selected from the word lists of a national standard, which have been used for measuring the articulation index in China since 1995. A pilot experiment was conducted to determine suitable signal-to-noise ratios (SNR) for measuring speech intelligibility. The main experiment was conducted to derive the FIFs using 288 test conditions (4 SNRs × 36 filtering conditions × 2 speaker genders). The noise was speech-spectrum shaped and it was generated separately for the male and female speech materials. The results show that, using 1/3 octave analysis bands: (1) The FIF averaged across genders has a peak in the frequency range between 1000 and 2500 Hz, which is consistent with the FIF for English monosyllabic words; (2) The frequency bands centered at 160, 1600, and 2000 Hz are slightly more important for Mandarin Chinese than for English; (3) Male speech is more intelligible than female speech, and the band centered at 160 Hz is more important for female than male speech. The FIF differences between Mandarin and English and the effect of speaker gender are analyzed and discussed.
KW - Frequency importance function
KW - Mandarin Chinese
KW - Speaker gender
KW - Speech intelligibility index
UR - http://www.scopus.com/inward/record.url?scp=84983382288&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2016.07.009
DO - 10.1016/j.specom.2016.07.009
M3 - Article
AN - SCOPUS:84983382288
SN - 0167-6393
VL - 83
SP - 94
EP - 103
JO - Speech Communication
JF - Speech Communication
ER -