Abstract
The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.
Original language | English |
---|---|
Pages (from-to) | 1506-1510 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2022-September |
DOIs | |
Publication status | Published - 2022 |
Event | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022 |
Keywords
- attentional feature fusion
- bone conducted signal
- feature enhancement
- human sound classification