Learning multi-resolution representations for acoustic scene classification via neural networks

Zijiang Yang; Kun Qian; Zhao Ren; Alice Baird; Zixing Zhang; Björn Schuller

doi:10.1007/978-981-15-2756-2_11

Learning multi-resolution representations for acoustic scene classification via neural networks

Zijiang Yang, Kun Qian^*, Zhao Ren, Alice Baird, Zixing Zhang, Björn Schuller

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Citations (Scopus)

Abstract

This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

Original language	English
Title of host publication	Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers
Editors	Haifeng Li, Lin Ma, Shengchen Li, Chunying Fang, Yidan Zhu
Publisher	Springer
Pages	133-143
Number of pages	11
ISBN (Print)	9789811527555
DOIs	https://doi.org/10.1007/978-981-15-2756-2_11
Publication status	Published - 2020
Externally published	Yes
Event	7th Conference on Sound and Music Technology, CSMT 2019 - Harbin, China Duration: 26 Dec 2019 → 29 Dec 2019

Publication series

Name	Lecture Notes in Electrical Engineering
Volume	635
ISSN (Print)	1876-1100
ISSN (Electronic)	1876-1119

Conference

Conference	7th Conference on Sound and Music Technology, CSMT 2019
Country/Territory	China
City	Harbin
Period	26/12/19 → 29/12/19

Keywords

Acoustic Scene Classification
Machine Learning
Neural Networks
Wavelets

Access to Document

10.1007/978-981-15-2756-2_11

Cite this

Yang, Z., Qian, K., Ren, Z., Baird, A., Zhang, Z., & Schuller, B. (2020). Learning multi-resolution representations for acoustic scene classification via neural networks. In H. Li, L. Ma, S. Li, C. Fang, & Y. Zhu (Eds.), Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers (pp. 133-143). (Lecture Notes in Electrical Engineering; Vol. 635). Springer. https://doi.org/10.1007/978-981-15-2756-2_11

Yang, Zijiang ; Qian, Kun ; Ren, Zhao et al. / Learning multi-resolution representations for acoustic scene classification via neural networks. Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. editor / Haifeng Li ; Lin Ma ; Shengchen Li ; Chunying Fang ; Yidan Zhu. Springer, 2020. pp. 133-143 (Lecture Notes in Electrical Engineering).

@inproceedings{2855130c0d2a48db85d96f3942297f8e,

title = "Learning multi-resolution representations for acoustic scene classification via neural networks",

abstract = "This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.",

keywords = "Acoustic Scene Classification, Machine Learning, Neural Networks, Wavelets",

author = "Zijiang Yang and Kun Qian and Zhao Ren and Alice Baird and Zixing Zhang and Bj{\"o}rn Schuller",

note = "Publisher Copyright: {\textcopyright} Springer Nature Singapore Pte Ltd 2020.; 7th Conference on Sound and Music Technology, CSMT 2019 ; Conference date: 26-12-2019 Through 29-12-2019",

year = "2020",

doi = "10.1007/978-981-15-2756-2_11",

language = "English",

isbn = "9789811527555",

series = "Lecture Notes in Electrical Engineering",

publisher = "Springer",

pages = "133--143",

editor = "Haifeng Li and Lin Ma and Shengchen Li and Chunying Fang and Yidan Zhu",

booktitle = "Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers",

address = "Germany",

}

Yang, Z, Qian, K, Ren, Z, Baird, A, Zhang, Z & Schuller, B 2020, Learning multi-resolution representations for acoustic scene classification via neural networks. in H Li, L Ma, S Li, C Fang & Y Zhu (eds), Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. Lecture Notes in Electrical Engineering, vol. 635, Springer, pp. 133-143, 7th Conference on Sound and Music Technology, CSMT 2019, Harbin, China, 26/12/19. https://doi.org/10.1007/978-981-15-2756-2_11

Learning multi-resolution representations for acoustic scene classification via neural networks. / Yang, Zijiang; Qian, Kun; Ren, Zhao et al.
Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. ed. / Haifeng Li; Lin Ma; Shengchen Li; Chunying Fang; Yidan Zhu. Springer, 2020. p. 133-143 (Lecture Notes in Electrical Engineering; Vol. 635).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Learning multi-resolution representations for acoustic scene classification via neural networks

AU - Yang, Zijiang

AU - Qian, Kun

AU - Ren, Zhao

AU - Baird, Alice

AU - Zhang, Zixing

AU - Schuller, Björn

N1 - Publisher Copyright: © Springer Nature Singapore Pte Ltd 2020.

PY - 2020

Y1 - 2020

N2 - This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

AB - This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

KW - Acoustic Scene Classification

KW - Machine Learning

KW - Neural Networks

KW - Wavelets

UR - http://www.scopus.com/inward/record.url?scp=85078485967&partnerID=8YFLogxK

U2 - 10.1007/978-981-15-2756-2_11

DO - 10.1007/978-981-15-2756-2_11

M3 - Conference contribution

AN - SCOPUS:85078485967

SN - 9789811527555

T3 - Lecture Notes in Electrical Engineering

SP - 133

EP - 143

BT - Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers

A2 - Li, Haifeng

A2 - Ma, Lin

A2 - Li, Shengchen

A2 - Fang, Chunying

A2 - Zhu, Yidan

PB - Springer

T2 - 7th Conference on Sound and Music Technology, CSMT 2019

Y2 - 26 December 2019 through 29 December 2019

ER -

Yang Z, Qian K, Ren Z, Baird A, Zhang Z, Schuller B. Learning multi-resolution representations for acoustic scene classification via neural networks. In Li H, Ma L, Li S, Fang C, Zhu Y, editors, Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers. Springer. 2020. p. 133-143. (Lecture Notes in Electrical Engineering). doi: 10.1007/978-981-15-2756-2_11

Learning multi-resolution representations for acoustic scene classification via neural networks

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this