Explainable Stuttering Recognition Using Axial Attention

Yu Ma; Yuting Huang; Kaixiang Yuan; Guangzhe Xuan; Yongzi Yu; Hengrui Zhong; Rui Li; Jian Shen; Kun Qian; Bin Hu; Björn W. Schuller; Yoshiharu Yamamoto

doi:10.1007/978-981-99-4749-2_18

Explainable Stuttering Recognition Using Axial Attention

Yu Ma, Yuting Huang, Kaixiang Yuan, Guangzhe Xuan, Yongzi Yu, Hengrui Zhong, Rui Li, Jian Shen^*, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

^*Corresponding author for this work

School of Medical and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022, demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.

Original language	English
Title of host publication	Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings
Editors	De-Shuang Huang, Prashan Premaratne, Baohua Jin, Boyang Qu, Kang-Hyun Jo, Abir Hussain
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	209-220
Number of pages	12
ISBN (Print)	9789819947485
DOIs	https://doi.org/10.1007/978-981-99-4749-2_18
Publication status	Published - 2023
Event	19th International Conference on Intelligent Computing, ICIC 2023 - Zhengzhou, China Duration: 10 Aug 2023 → 13 Aug 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14088 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	19th International Conference on Intelligent Computing, ICIC 2023
Country/Territory	China
City	Zhengzhou
Period	10/08/23 → 13/08/23

Keywords

Histogram of Oriented Gradient
Speech
Stuttering Recognition
Wavelet Transformation

Access to Document

10.1007/978-981-99-4749-2_18

Cite this

Ma, Y., Huang, Y., Yuan, K., Xuan, G., Yu, Y., Zhong, H., Li, R., Shen, J., Qian, K., Hu, B., Schuller, B. W., & Yamamoto, Y. (2023). Explainable Stuttering Recognition Using Axial Attention. In D.-S. Huang, P. Premaratne, B. Jin, B. Qu, K.-H. Jo, & A. Hussain (Eds.), Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings (pp. 209-220). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14088 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-4749-2_18

Ma, Yu ; Huang, Yuting ; Yuan, Kaixiang et al. / Explainable Stuttering Recognition Using Axial Attention. Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings. editor / De-Shuang Huang ; Prashan Premaratne ; Baohua Jin ; Boyang Qu ; Kang-Hyun Jo ; Abir Hussain. Springer Science and Business Media Deutschland GmbH, 2023. pp. 209-220 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{90801b49169f479a8eca614fd2cd4c4e,

title = "Explainable Stuttering Recognition Using Axial Attention",

abstract = "Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022, demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.",

keywords = "Histogram of Oriented Gradient, Speech, Stuttering Recognition, Wavelet Transformation",

author = "Yu Ma and Yuting Huang and Kaixiang Yuan and Guangzhe Xuan and Yongzi Yu and Hengrui Zhong and Rui Li and Jian Shen and Kun Qian and Bin Hu and Schuller, {Bj{\"o}rn W.} and Yoshiharu Yamamoto",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 19th International Conference on Intelligent Computing, ICIC 2023 ; Conference date: 10-08-2023 Through 13-08-2023",

year = "2023",

doi = "10.1007/978-981-99-4749-2_18",

language = "English",

isbn = "9789819947485",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "209--220",

editor = "De-Shuang Huang and Prashan Premaratne and Baohua Jin and Boyang Qu and Kang-Hyun Jo and Abir Hussain",

booktitle = "Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings",

address = "Germany",

}

Ma, Y, Huang, Y, Yuan, K, Xuan, G, Yu, Y, Zhong, H, Li, R, Shen, J, Qian, K, Hu, B, Schuller, BW & Yamamoto, Y 2023, Explainable Stuttering Recognition Using Axial Attention. in D-S Huang, P Premaratne, B Jin, B Qu, K-H Jo & A Hussain (eds), Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14088 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 209-220, 19th International Conference on Intelligent Computing, ICIC 2023, Zhengzhou, China, 10/08/23. https://doi.org/10.1007/978-981-99-4749-2_18

Explainable Stuttering Recognition Using Axial Attention. / Ma, Yu; Huang, Yuting; Yuan, Kaixiang et al.
Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings. ed. / De-Shuang Huang; Prashan Premaratne; Baohua Jin; Boyang Qu; Kang-Hyun Jo; Abir Hussain. Springer Science and Business Media Deutschland GmbH, 2023. p. 209-220 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14088 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Explainable Stuttering Recognition Using Axial Attention

AU - Ma, Yu

AU - Huang, Yuting

AU - Yuan, Kaixiang

AU - Xuan, Guangzhe

AU - Yu, Yongzi

AU - Zhong, Hengrui

AU - Li, Rui

AU - Shen, Jian

AU - Qian, Kun

AU - Hu, Bin

AU - Schuller, Björn W.

AU - Yamamoto, Yoshiharu

PY - 2023

Y1 - 2023

N2 - Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022, demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.

AB - Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022, demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.

KW - Histogram of Oriented Gradient

KW - Speech

KW - Stuttering Recognition

KW - Wavelet Transformation

UR - http://www.scopus.com/inward/record.url?scp=85174804382&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-4749-2_18

DO - 10.1007/978-981-99-4749-2_18

M3 - Conference contribution

AN - SCOPUS:85174804382

SN - 9789819947485

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 209

EP - 220

BT - Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings

A2 - Huang, De-Shuang

A2 - Premaratne, Prashan

A2 - Jin, Baohua

A2 - Qu, Boyang

A2 - Jo, Kang-Hyun

A2 - Hussain, Abir

PB - Springer Science and Business Media Deutschland GmbH

T2 - 19th International Conference on Intelligent Computing, ICIC 2023

Y2 - 10 August 2023 through 13 August 2023

ER -

Ma Y, Huang Y, Yuan K, Xuan G, Yu Y, Zhong H et al. Explainable Stuttering Recognition Using Axial Attention. In Huang DS, Premaratne P, Jin B, Qu B, Jo KH, Hussain A, editors, Advanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 209-220. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-99-4749-2_18

Explainable Stuttering Recognition Using Axial Attention

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this