TY - JOUR
T1 - Soft-Median Selection
T2 - An adaptive feature smoothening method for sound event detection
AU - Zhao, Fengnian
AU - Li, Ruwei
AU - Liu, Xin
AU - Xu, Liwen
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/4
Y1 - 2022/4
N2 - The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm's Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.
AB - The existing Sound Event Detection (SED) algorithms pay too much attention to the differences between the internal frames of the events but do not pay enough attention to their boundaries. This situation leads to event splitting, false negatives, and inaccurate start and end times, reducing the SED performance. In order to solve this problem, this paper proposes the Soft-Median Selection (SMS) to smoothen the features of frames in the time axis adaptively. Firstly, the Differentiable Soft-Median Filter (DSMF) is designed as a filter to be applied to a neural network appropriately. Secondly, the DSMFs and a Linear Selection are combined as the SMS. The DSMFs of different lengths are used to smoothen the features to different degrees, and the Linear Selection adaptively synthesizes the smoothened features. Since the weight of each DSMF is learned, SMS can adaptively smoothen features without setting parameters in advance and thus has good generalization ability. The proposed DSMF solves the problem that the gradient cannot propagate across the median filter, and the propagation is not smooth. The experimental results show that the proposed SED algorithm based on SMS can effectively improve edge detection accuracy and make the internal prediction results of sound events more stable. The SMS-based SED algorithm's Event-based F1 Score (EBFS) is 21.7% higher than the baseline and 3.0% higher than the winning algorithm in Task 4 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019.
KW - Differentiable Soft-Median Filter (DSMF)
KW - Soft-Median Selection (SMS)
KW - Sound Event Detection (SED)
UR - http://www.scopus.com/inward/record.url?scp=85126572500&partnerID=8YFLogxK
U2 - 10.1016/j.apacoust.2022.108715
DO - 10.1016/j.apacoust.2022.108715
M3 - Article
AN - SCOPUS:85126572500
SN - 0003-682X
VL - 192
JO - Applied Acoustics
JF - Applied Acoustics
M1 - 108715
ER -