基于改进池化层的弱标记声音事件检测

Translated title of the contribution: Weakly Labeled Sound Event Detection Based on Improved Pooling Layer

Miao Liu, Jing Wang, Guiguan Dong, Weiming Yi

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

For the large scale weakly labeled data set provided by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge Task 4, we built a multi-class sound event detection system based on the Mel filter bank features (Fbank), convolutional neural networks (CNN), and recurrent neural networks (RNN). In this paper, we analyzed the partial deduction process of two existing common pooling layers, attention and linear softmax, in neural network back propagation. On the basis of linear softmax pooling layer, "exponential learnable power function softmax" pooling layer was proposed. Our experimental results show that, compared to the first-placed model in the DCASE competition, the sound event detection system applying the proposed "exponential learnable power function softmax" pooling function increases the clip level Fl value of sound event prediction from 0. 556 to 0. 652, the frame level Fl value from 0. 555 to 0. 583 and reduces the frame level error rate (ER) from 0. 660 to 0. 667.

Translated title of the contributionWeakly Labeled Sound Event Detection Based on Improved Pooling Layer
Original languageChinese (Traditional)
Pages (from-to)1907-1913
Number of pages7
JournalJournal of Signal Processing
Volume37
Issue number10
DOIs
Publication statusPublished - Oct 2021

Fingerprint

Dive into the research topics of 'Weakly Labeled Sound Event Detection Based on Improved Pooling Layer'. Together they form a unique fingerprint.

Cite this