Semi-Supervised Sound Event Detection with Pre-Trained Model

Liang Xu*, Lizhong Wang, Sijun Bi*, Hanyue Liu*, Jing Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Citations (Scopus)

Abstract

Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. In this paper, we focus on the semi-supervised SED task, and combine pre-trained model from other field to assist in improving the detection effect. Pre-trained models have been widely used in various tasks in the field of speech, such as automatic speech recognition, audio tagging, etc. If the training dataset is large and general enough, the embedding features extracted by the pre-trained model will cover the potential information in the original task. We use pre-trained model PANNs which is suitable for SED task and proposed two methods to fuse the features from PANNs and original model, respectively. In addition, we also propose a weight raised temporal contrastive loss to improve the model's switching speed at event boundaries and the smoothness within events. Experimental results show that using pre-trained model features outperforms the baseline by 8.5% and 9.1% in DESED public evaluation dataset in terms of polyphonic sound detection score (PSDS).

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
Publication statusPublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Keywords

  • mean-teacher
  • pre-trained
  • sound event detection
  • temporal contrastive loss

Fingerprint

Dive into the research topics of 'Semi-Supervised Sound Event Detection with Pre-Trained Model'. Together they form a unique fingerprint.

Cite this