Learning multi-resolution representations for acoustic scene classification via neural networks

Zijiang Yang; Kun Qian; Zhao Ren; Alice Baird; Zixing Zhang; Björn Schuller

doi:10.1007/978-981-15-2756-2_11

Learning multi-resolution representations for acoustic scene classification via neural networks

Zijiang Yang, Kun Qian^*, Zhao Ren, Alice Baird, Zixing Zhang, Björn Schuller

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

5 引用（Scopus）

摘要

This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

源语言	英语
主期刊名	Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers
编辑	Haifeng Li, Lin Ma, Shengchen Li, Chunying Fang, Yidan Zhu
出版商	Springer
页	133-143
页数	11
ISBN（印刷版）	9789811527555
DOI	https://doi.org/10.1007/978-981-15-2756-2_11
出版状态	已出版 - 2020
已对外发布	是
活动	7th Conference on Sound and Music Technology, CSMT 2019 - Harbin, 中国期限: 26 12月 2019 → 29 12月 2019

出版系列

姓名	Lecture Notes in Electrical Engineering
卷	635
ISSN（印刷版）	1876-1100
ISSN（电子版）	1876-1119

会议

会议	7th Conference on Sound and Music Technology, CSMT 2019
国家/地区	中国
市	Harbin
时期	26/12/19 → 29/12/19

访问文件

10.1007/978-981-15-2756-2_11

其它文件与链接

链接到 Scopus 的出版物

引用此

Yang, Z., Qian, K., Ren, Z., Baird, A., Zhang, Z., & Schuller, B. (2020). Learning multi-resolution representations for acoustic scene classification via neural networks. 在 H. Li, L. Ma, S. Li, C. Fang, & Y. Zhu (编辑), Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers (页码 133-143). (Lecture Notes in Electrical Engineering; 卷 635). Springer. https://doi.org/10.1007/978-981-15-2756-2_11

Yang, Zijiang ; Qian, Kun ; Ren, Zhao 等. / Learning multi-resolution representations for acoustic scene classification via neural networks. Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. 编辑 / Haifeng Li ; Lin Ma ; Shengchen Li ; Chunying Fang ; Yidan Zhu. Springer, 2020. 页码 133-143 (Lecture Notes in Electrical Engineering).

@inproceedings{2855130c0d2a48db85d96f3942297f8e,

title = "Learning multi-resolution representations for acoustic scene classification via neural networks",

abstract = "This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.",

keywords = "Acoustic Scene Classification, Machine Learning, Neural Networks, Wavelets",

author = "Zijiang Yang and Kun Qian and Zhao Ren and Alice Baird and Zixing Zhang and Bj{\"o}rn Schuller",

note = "Publisher Copyright: {\textcopyright} Springer Nature Singapore Pte Ltd 2020.; 7th Conference on Sound and Music Technology, CSMT 2019 ; Conference date: 26-12-2019 Through 29-12-2019",

year = "2020",

doi = "10.1007/978-981-15-2756-2_11",

language = "English",

isbn = "9789811527555",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer",

pages = "133--143",

editor = "Haifeng Li and Lin Ma and Shengchen Li and Chunying Fang and Yidan Zhu",

booktitle = "Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers",

address = "Germany",

}

Yang, Z, Qian, K, Ren, Z, Baird, A, Zhang, Z & Schuller, B 2020, Learning multi-resolution representations for acoustic scene classification via neural networks. 在 H Li, L Ma, S Li, C Fang & Y Zhu (编辑), Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. Lecture Notes in Electrical Engineering, 卷 635, Springer, 页码 133-143, 7th Conference on Sound and Music Technology, CSMT 2019, Harbin, 中国, 26/12/19. https://doi.org/10.1007/978-981-15-2756-2_11

Learning multi-resolution representations for acoustic scene classification via neural networks. / Yang, Zijiang; Qian, Kun; Ren, Zhao 等.
Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. 编辑 / Haifeng Li; Lin Ma; Shengchen Li; Chunying Fang; Yidan Zhu. Springer, 2020. 页码 133-143 (Lecture Notes in Electrical Engineering; 卷 635).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Learning multi-resolution representations for acoustic scene classification via neural networks

AU - Yang, Zijiang

AU - Qian, Kun

AU - Ren, Zhao

AU - Baird, Alice

AU - Zhang, Zixing

AU - Schuller, Björn

N1 - Publisher Copyright: © Springer Nature Singapore Pte Ltd 2020.

PY - 2020

Y1 - 2020

N2 - This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

AB - This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

KW - Acoustic Scene Classification

KW - Machine Learning

KW - Neural Networks

KW - Wavelets

UR - http://www.scopus.com/inward/record.url?scp=85078485967&partnerID=8YFLogxK

U2 - 10.1007/978-981-15-2756-2_11

DO - 10.1007/978-981-15-2756-2_11

M3 - Conference contribution

AN - SCOPUS:85078485967

SN - 9789811527555

T3 - Lecture Notes in Electrical Engineering

SP - 133

EP - 143

BT - Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers

A2 - Li, Haifeng

A2 - Ma, Lin

A2 - Li, Shengchen

A2 - Fang, Chunying

A2 - Zhu, Yidan

PB - Springer

T2 - 7th Conference on Sound and Music Technology, CSMT 2019

Y2 - 26 December 2019 through 29 December 2019

ER -

Yang Z, Qian K, Ren Z, Baird A, Zhang Z, Schuller B. Learning multi-resolution representations for acoustic scene classification via neural networks. 在 Li H, Ma L, Li S, Fang C, Zhu Y, 编辑, Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. Springer. 2020. 页码 133-143. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-15-2756-2_11

Learning multi-resolution representations for acoustic scene classification via neural networks

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此