Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

Liang Xu; Jing Wang; Lizhong Wang; Sijun Bi; Jianqian Zhang; Qiuyue Ma

doi:10.21437/Interspeech.2022-348

Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

Liang Xu, Jing Wang, Lizhong Wang, Sijun Bi, Jianqian Zhang, Qiuyue Ma

School of Information and Electronics

Research output: Contribution to journal › Conference article › peer-review

2 Citations (Scopus)

Abstract

The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.

Original language	English
Pages (from-to)	1506-1510
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-348
Publication status	Published - 2022
Event	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022

Keywords

attentional feature fusion
bone conducted signal
feature enhancement
human sound classification

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.21437/Interspeech.2022-348

Cite this

@article{bd79385f89de4e85a5e1fe3ec2e1540f,

title = "Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal",

abstract = "The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.",

keywords = "attentional feature fusion, bone conducted signal, feature enhancement, human sound classification",

author = "Liang Xu and Jing Wang and Lizhong Wang and Sijun Bi and Jianqian Zhang and Qiuyue Ma",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 ISCA.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-348",

language = "English",

volume = "2022-September",

pages = "1506--1510",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

AU - Xu, Liang

AU - Wang, Jing

AU - Wang, Lizhong

AU - Bi, Sijun

AU - Zhang, Jianqian

AU - Ma, Qiuyue

PY - 2022

Y1 - 2022

N2 - The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.

AB - The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.

KW - attentional feature fusion

KW - bone conducted signal

KW - feature enhancement

KW - human sound classification

UR - http://www.scopus.com/inward/record.url?scp=85140059155&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-348

DO - 10.21437/Interspeech.2022-348

M3 - Conference article

AN - SCOPUS:85140059155

SN - 2308-457X

VL - 2022-September

SP - 1506

EP - 1510

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this