TY - JOUR
T1 - Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal
AU - Xu, Liang
AU - Wang, Jing
AU - Wang, Lizhong
AU - Bi, Sijun
AU - Zhang, Jianqian
AU - Ma, Qiuyue
N1 - Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.
AB - The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.
KW - attentional feature fusion
KW - bone conducted signal
KW - feature enhancement
KW - human sound classification
UR - http://www.scopus.com/inward/record.url?scp=85140059155&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-348
DO - 10.21437/Interspeech.2022-348
M3 - Conference article
AN - SCOPUS:85140059155
SN - 2308-457X
VL - 2022-September
SP - 1506
EP - 1510
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -