Soft-Median Selection: An adaptive feature smoothening method for sound event detection

Fengnian Zhao; Ruwei Li; Xin Liu; Liwen Xu

doi:10.1016/j.apacoust.2022.108715

Soft-Median Selection: An adaptive feature smoothening method for sound event detection

Fengnian Zhao, Ruwei Li^*, Xin Liu, Liwen Xu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm's Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.

Original language	English
Article number	108715
Journal	Applied Acoustics
Volume	192
DOIs	https://doi.org/10.1016/j.apacoust.2022.108715
Publication status	Published - Apr 2022
Externally published	Yes

Keywords

Differentiable Soft-Median Filter (DSMF)
Soft-Median Selection (SMS)
Sound Event Detection (SED)

Access to Document

10.1016/j.apacoust.2022.108715

Cite this

Zhao, F., Li, R., Liu, X., & Xu, L. (2022). Soft-Median Selection: An adaptive feature smoothening method for sound event detection. Applied Acoustics, 192, Article 108715. https://doi.org/10.1016/j.apacoust.2022.108715

@article{d00f24d783d14df091455091308f39b6,

title = "Soft-Median Selection: An adaptive feature smoothening method for sound event detection",

abstract = "The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm's Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.",

keywords = "Differentiable Soft-Median Filter (DSMF), Soft-Median Selection (SMS), Sound Event Detection (SED)",

author = "Fengnian Zhao and Ruwei Li and Xin Liu and Liwen Xu",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = apr,

doi = "10.1016/j.apacoust.2022.108715",

language = "English",

volume = "192",

journal = "Applied Acoustics",

issn = "0003-682X",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Soft-Median Selection

T2 - An adaptive feature smoothening method for sound event detection

AU - Zhao, Fengnian

AU - Li, Ruwei

AU - Liu, Xin

AU - Xu, Liwen

PY - 2022/4

Y1 - 2022/4

N2 - The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm's Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.

AB - The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm's Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.

KW - Differentiable Soft-Median Filter (DSMF)

KW - Soft-Median Selection (SMS)

KW - Sound Event Detection (SED)

UR - http://www.scopus.com/inward/record.url?scp=85126572500&partnerID=8YFLogxK

U2 - 10.1016/j.apacoust.2022.108715

DO - 10.1016/j.apacoust.2022.108715

M3 - Article

AN - SCOPUS:85126572500

SN - 0003-682X

VL - 192

JO - Applied Acoustics

JF - Applied Acoustics

M1 - 108715

ER -

Soft-Median Selection: An adaptive feature smoothening method for sound event detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this