Semi-Supervised Sound Event Detection with Pre-Trained Model

Liang Xu*, Lizhong Wang, Sijun Bi*, Hanyue Liu*, Jing Wang*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

8 引用 (Scopus)

摘要

Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. In this paper, we focus on the semi-supervised SED task, and combine pre-trained model from other field to assist in improving the detection effect. Pre-trained models have been widely used in various tasks in the field of speech, such as automatic speech recognition, audio tagging, etc. If the training dataset is large and general enough, the embedding features extracted by the pre-trained model will cover the potential information in the original task. We use pre-trained model PANNs which is suitable for SED task and proposed two methods to fuse the features from PANNs and original model, respectively. In addition, we also propose a weight raised temporal contrastive loss to improve the model's switching speed at event boundaries and the smoothness within events. Experimental results show that using pre-trained model features outperforms the baseline by 8.5% and 9.1% in DESED public evaluation dataset in terms of polyphonic sound detection score (PSDS).

源语言英语
主期刊名ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9781728163277
DOI
出版状态已出版 - 2023
活动48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, 希腊
期限: 4 6月 202310 6月 2023

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2023-June
ISSN(印刷版)1520-6149

会议

会议48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
国家/地区希腊
Rhodes Island
时期4/06/2310/06/23

指纹

探究 'Semi-Supervised Sound Event Detection with Pre-Trained Model' 的科研主题。它们共同构成独一无二的指纹。

引用此