Optimization of EVS speech/music classifier based on deep learning

Zhitong Li; Xiang Xie; Jing Wang; Volodya Grancharov; Wei Liu

doi:10.1109/ICSP.2018.8652295

Optimization of EVS speech/music classifier based on deep learning

Zhitong Li, Xiang Xie, Jing Wang, Volodya Grancharov, Wei Liu

信息与电子学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

3 引用（Scopus）

摘要

EVS (Enhanced Voice Services) is a multi-mode codec proposed by 3GPP (3rd Generation Partnership Project) for 4G mobile services with a good performance and codec quality. The key technology of EVS lies in the flexible switch between speech and audio coding mode which mostly depends on the speech/music classifier. In general, the music signal is more complex than speech signal, and it conform less to any known LP (Linear Prediction)-based model. Taking the EVS's internal classifier as a baseline system, this study presents the optimization of the speech/music classifier from the perspective of neural network. The paper demonstrates the effectiveness of the optimized system on the MUSAN database. The experimental results show that the optimized system can improve the performance of the classifier, especially for music classification. Performed subjective experiments indicate that the proposed classification architecture improves perceived audio quality of the EVS codec.

源语言	英语
主期刊名	2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018
编辑	Yuan Baozong, Ruan Qiuqi, Zhao Yao, An Gaoyun
出版商	Institute of Electrical and Electronics Engineers Inc.
页	260-264
页数	5
ISBN（电子版）	9781538646724
DOI	https://doi.org/10.1109/ICSP.2018.8652295
出版状态	已出版 - 2 2月 2019
活动	14th IEEE International Conference on Signal Processing, ICSP 2018 - Beijing, 中国期限: 12 8月 2018 → 16 8月 2018

出版系列

姓名	International Conference on Signal Processing Proceedings, ICSP
卷	2018-August

会议

会议	14th IEEE International Conference on Signal Processing, ICSP 2018
国家/地区	中国
市	Beijing
时期	12/08/18 → 16/08/18

访问文件

10.1109/ICSP.2018.8652295

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, Z., Xie, X., Wang, J., Grancharov, V., & Liu, W. (2019). Optimization of EVS speech/music classifier based on deep learning. 在 Y. Baozong, R. Qiuqi, Z. Yao, & A. Gaoyun (编辑), 2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018 (页码 260-264). 文章 8652295 (International Conference on Signal Processing Proceedings, ICSP; 卷 2018-August). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSP.2018.8652295

Li, Zhitong ; Xie, Xiang ; Wang, Jing 等. / Optimization of EVS speech/music classifier based on deep learning. 2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018. 编辑 / Yuan Baozong ; Ruan Qiuqi ; Zhao Yao ; An Gaoyun. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 260-264 (International Conference on Signal Processing Proceedings, ICSP).

@inproceedings{7e7d5c2004964dc6bfd837db20fb92c4,

title = "Optimization of EVS speech/music classifier based on deep learning",

abstract = "EVS (Enhanced Voice Services) is a multi-mode codec proposed by 3GPP (3rd Generation Partnership Project) for 4G mobile services with a good performance and codec quality. The key technology of EVS lies in the flexible switch between speech and audio coding mode which mostly depends on the speech/music classifier. In general, the music signal is more complex than speech signal, and it conform less to any known LP (Linear Prediction)-based model. Taking the EVS's internal classifier as a baseline system, this study presents the optimization of the speech/music classifier from the perspective of neural network. The paper demonstrates the effectiveness of the optimized system on the MUSAN database. The experimental results show that the optimized system can improve the performance of the classifier, especially for music classification. Performed subjective experiments indicate that the proposed classification architecture improves perceived audio quality of the EVS codec.",

keywords = "Audio test, Deep Learning, EVS, Speech/Music classifier",

author = "Zhitong Li and Xiang Xie and Jing Wang and Volodya Grancharov and Wei Liu",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 14th IEEE International Conference on Signal Processing, ICSP 2018 ; Conference date: 12-08-2018 Through 16-08-2018",

year = "2019",

month = feb,

day = "2",

doi = "10.1109/ICSP.2018.8652295",

language = "English",

series = "International Conference on Signal Processing Proceedings, ICSP",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "260--264",

editor = "Yuan Baozong and Ruan Qiuqi and Zhao Yao and An Gaoyun",

booktitle = "2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018",

address = "United States",

}

Li, Z, Xie, X , Wang, J, Grancharov, V & Liu, W 2019, Optimization of EVS speech/music classifier based on deep learning. 在 Y Baozong, R Qiuqi, Z Yao & A Gaoyun (编辑), 2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018., 8652295, International Conference on Signal Processing Proceedings, ICSP, 卷 2018-August, Institute of Electrical and Electronics Engineers Inc., 页码 260-264, 14th IEEE International Conference on Signal Processing, ICSP 2018, Beijing, 中国, 12/08/18. https://doi.org/10.1109/ICSP.2018.8652295

Optimization of EVS speech/music classifier based on deep learning. / Li, Zhitong; Xie, Xiang ; Wang, Jing 等.
2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018. 编辑 / Yuan Baozong; Ruan Qiuqi; Zhao Yao; An Gaoyun. Institute of Electrical and Electronics Engineers Inc., 2019. 页码 260-264 8652295 (International Conference on Signal Processing Proceedings, ICSP; 卷 2018-August).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Optimization of EVS speech/music classifier based on deep learning

AU - Li, Zhitong

AU - Xie, Xiang

AU - Wang, Jing

AU - Grancharov, Volodya

AU - Liu, Wei

PY - 2019/2/2

Y1 - 2019/2/2

N2 - EVS (Enhanced Voice Services) is a multi-mode codec proposed by 3GPP (3rd Generation Partnership Project) for 4G mobile services with a good performance and codec quality. The key technology of EVS lies in the flexible switch between speech and audio coding mode which mostly depends on the speech/music classifier. In general, the music signal is more complex than speech signal, and it conform less to any known LP (Linear Prediction)-based model. Taking the EVS's internal classifier as a baseline system, this study presents the optimization of the speech/music classifier from the perspective of neural network. The paper demonstrates the effectiveness of the optimized system on the MUSAN database. The experimental results show that the optimized system can improve the performance of the classifier, especially for music classification. Performed subjective experiments indicate that the proposed classification architecture improves perceived audio quality of the EVS codec.

AB - EVS (Enhanced Voice Services) is a multi-mode codec proposed by 3GPP (3rd Generation Partnership Project) for 4G mobile services with a good performance and codec quality. The key technology of EVS lies in the flexible switch between speech and audio coding mode which mostly depends on the speech/music classifier. In general, the music signal is more complex than speech signal, and it conform less to any known LP (Linear Prediction)-based model. Taking the EVS's internal classifier as a baseline system, this study presents the optimization of the speech/music classifier from the perspective of neural network. The paper demonstrates the effectiveness of the optimized system on the MUSAN database. The experimental results show that the optimized system can improve the performance of the classifier, especially for music classification. Performed subjective experiments indicate that the proposed classification architecture improves perceived audio quality of the EVS codec.

KW - Audio test

KW - Deep Learning

KW - EVS

KW - Speech/Music classifier

UR - http://www.scopus.com/inward/record.url?scp=85063272480&partnerID=8YFLogxK

U2 - 10.1109/ICSP.2018.8652295

DO - 10.1109/ICSP.2018.8652295

M3 - Conference contribution

AN - SCOPUS:85063272480

T3 - International Conference on Signal Processing Proceedings, ICSP

SP - 260

EP - 264

BT - 2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018

A2 - Baozong, Yuan

A2 - Qiuqi, Ruan

A2 - Yao, Zhao

A2 - Gaoyun, An

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 14th IEEE International Conference on Signal Processing, ICSP 2018

Y2 - 12 August 2018 through 16 August 2018

ER -

Li Z, Xie X , Wang J, Grancharov V, Liu W. Optimization of EVS speech/music classifier based on deep learning. 在 Baozong Y, Qiuqi R, Yao Z, Gaoyun A, 编辑, 2018 14th IEEE International Conference on Signal Processing Proceedings, ICSP 2018. Institute of Electrical and Electronics Engineers Inc. 2019. 页码 260-264. 8652295. (International Conference on Signal Processing Proceedings, ICSP). doi: 10.1109/ICSP.2018.8652295

Optimization of EVS speech/music classifier based on deep learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此