Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

Liang Xu; Jing Wang; Lizhong Wang; Sijun Bi; Jianqian Zhang; Qiuyue Ma

doi:10.21437/Interspeech.2022-348

Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

Liang Xu, Jing Wang, Lizhong Wang, Sijun Bi, Jianqian Zhang, Qiuyue Ma

信息与电子学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

2 引用（Scopus）

摘要

The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.

源语言	英语
页（从-至）	1506-1510
页数	5
期刊	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
卷	2022-September
DOI	https://doi.org/10.21437/Interspeech.2022-348
出版状态	已出版 - 2022
活动	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, 韩国期限: 18 9月 2022 → 22 9月 2022

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.21437/Interspeech.2022-348

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{bd79385f89de4e85a5e1fe3ec2e1540f,

title = "Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal",

abstract = "The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.",

keywords = "attentional feature fusion, bone conducted signal, feature enhancement, human sound classification",

author = "Liang Xu and Jing Wang and Lizhong Wang and Sijun Bi and Jianqian Zhang and Qiuyue Ma",

note = "Publisher Copyright: Copyright {\textcopyright} 2022 ISCA.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-348",

language = "English",

volume = "2022-September",

pages = "1506--1510",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

AU - Xu, Liang

AU - Wang, Jing

AU - Wang, Lizhong

AU - Bi, Sijun

AU - Zhang, Jianqian

AU - Ma, Qiuyue

PY - 2022

Y1 - 2022

N2 - The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.

AB - The human sound classification task aims at distinguishing different sounds made by human, which can be widely used in medical and health detection area. Different from other sounds in acoustic scene classification task, human sounds can be transmitted either through air or bone conduction. The bone conducted (BC) signal generated by a speaker has strong anti-noise properties and can assist the air conducted (AC) signal to extract additional acoustic features. In this paper, we explore the effect of the BC signal on human sound classification task. Two stream audios combing BC and AC signals are input to a CNN-based model. An attentional feature fusion method suitable for BC and AC signal features is proposed to improve the performance according to the complementarity between the two signal features. Further improvement can be obtained by using a BC signal feature enhancement method. Experiments on an open access and a self-built dataset show that fusing bone conducted signal can achieve 6.2%/17.4% performance improvement over the baseline with only AC signal as input. The results demonstrate the application value of bone conducted signals and the superior performance of the proposed methods.

KW - attentional feature fusion

KW - bone conducted signal

KW - feature enhancement

KW - human sound classification

UR - http://www.scopus.com/inward/record.url?scp=85140059155&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-348

DO - 10.21437/Interspeech.2022-348

M3 - Conference article

AN - SCOPUS:85140059155

SN - 2308-457X

VL - 2022-September

SP - 1506

EP - 1510

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

Human Sound Classification based on Feature Fusion Method with Air and Bone Conducted Signal

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此