TY - GEN
T1 - Learning multi-resolution representations for acoustic scene classification via neural networks
AU - Yang, Zijiang
AU - Qian, Kun
AU - Ren, Zhao
AU - Baird, Alice
AU - Zhang, Zixing
AU - Schuller, Björn
N1 - Publisher Copyright:
© Springer Nature Singapore Pte Ltd 2020.
PY - 2020
Y1 - 2020
N2 - This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.
AB - This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.
KW - Acoustic Scene Classification
KW - Machine Learning
KW - Neural Networks
KW - Wavelets
UR - http://www.scopus.com/inward/record.url?scp=85078485967&partnerID=8YFLogxK
U2 - 10.1007/978-981-15-2756-2_11
DO - 10.1007/978-981-15-2756-2_11
M3 - Conference contribution
AN - SCOPUS:85078485967
SN - 9789811527555
T3 - Lecture Notes in Electrical Engineering
SP - 133
EP - 143
BT - Proceedings of the 7th Conference on Sound and Music Technology CSMT 2019, Revised Selected Papers
A2 - Li, Haifeng
A2 - Ma, Lin
A2 - Li, Shengchen
A2 - Fang, Chunying
A2 - Zhu, Yidan
PB - Springer
T2 - 7th Conference on Sound and Music Technology, CSMT 2019
Y2 - 26 December 2019 through 29 December 2019
ER -