TY - JOUR
T1 - 基于改进池化层的弱标记声音事件检测
AU - Liu, Miao
AU - Wang, Jing
AU - Dong, Guiguan
AU - Yi, Weiming
N1 - Publisher Copyright:
© 2021 Editorial Board of Journal of Signal Processing. All rights reserved.
PY - 2021/10
Y1 - 2021/10
N2 - For the large scale weakly labeled data set provided by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge Task 4, we built a multi-class sound event detection system based on the Mel filter bank features (Fbank), convolutional neural networks (CNN), and recurrent neural networks (RNN). In this paper, we analyzed the partial deduction process of two existing common pooling layers, attention and linear softmax, in neural network back propagation. On the basis of linear softmax pooling layer, "exponential learnable power function softmax" pooling layer was proposed. Our experimental results show that, compared to the first-placed model in the DCASE competition, the sound event detection system applying the proposed "exponential learnable power function softmax" pooling function increases the clip level Fl value of sound event prediction from 0. 556 to 0. 652, the frame level Fl value from 0. 555 to 0. 583 and reduces the frame level error rate (ER) from 0. 660 to 0. 667.
AB - For the large scale weakly labeled data set provided by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge Task 4, we built a multi-class sound event detection system based on the Mel filter bank features (Fbank), convolutional neural networks (CNN), and recurrent neural networks (RNN). In this paper, we analyzed the partial deduction process of two existing common pooling layers, attention and linear softmax, in neural network back propagation. On the basis of linear softmax pooling layer, "exponential learnable power function softmax" pooling layer was proposed. Our experimental results show that, compared to the first-placed model in the DCASE competition, the sound event detection system applying the proposed "exponential learnable power function softmax" pooling function increases the clip level Fl value of sound event prediction from 0. 556 to 0. 652, the frame level Fl value from 0. 555 to 0. 583 and reduces the frame level error rate (ER) from 0. 660 to 0. 667.
KW - exponential learnable power function softmax
KW - pooling function
KW - sound event detection
KW - weak labeled
UR - http://www.scopus.com/inward/record.url?scp=85173974937&partnerID=8YFLogxK
U2 - 10.16798/j.issn.1003-0530.2021.10.014
DO - 10.16798/j.issn.1003-0530.2021.10.014
M3 - 文章
AN - SCOPUS:85173974937
SN - 1003-0530
VL - 37
SP - 1907
EP - 1913
JO - Journal of Signal Processing
JF - Journal of Signal Processing
IS - 10
ER -