Frequency Axis Pooling Method for Weakly Labeled Sound Event Detection and Classification

Miao Liu, Jing Wang, Yujun Wang, Lidong Yang

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Recently, the convolutional recurrent neural net-work (CRNN) has been widely used in weakly labeled sound event detection (SED) and audio tagging (AT) tasks. However, it is possible that the information of frequency dimension is not well used in the existing network design, which may cause information loss or redundancy. We propose a frequency axis pooling method to further boost the representation power of CRNN. Based on the existing pooling functions, the frequency axis pooling is applied on the feature map before recurrent neural network (RNN) input in CRNN. Compared to frequency axis no-pooling method, our method assigns different weights to different frequency dimensions during compressing, which can better compress frequency information and reduce information redundancy. To evaluate the proposed method, three commonly used pooling functions on frequency axis are compared on the Dcase2017 task4 dataset. The experimental results show that reasonable compression of frequency information helps to improve the performance of AT and SED tasks significantly. Among them, the frequency axis pooling based on linear softmax performs the best on both tasks.

源语言英语
主期刊名2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
945-949
页数5
ISBN(电子版)9789881476890
出版状态已出版 - 2021
活动2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, 日本
期限: 14 12月 202117 12月 2021

出版系列

姓名2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

会议

会议2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
国家/地区日本
Tokyo
时期14/12/2117/12/21

指纹

探究 'Frequency Axis Pooling Method for Weakly Labeled Sound Event Detection and Classification' 的科研主题。它们共同构成独一无二的指纹。

引用此