Explainable Stuttering Recognition Using Axial Attention

Yu Ma, Yuting Huang, Kaixiang Yuan, Guangzhe Xuan, Yongzi Yu, Hengrui Zhong, Rui Li, Jian Shen*, Kun Qian, Bin Hu, Björn W. Schuller, Yoshiharu Yamamoto

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Stuttering is a complex speech disorder that disrupts the flow of speech, and recognizing persons who stutter (PWS) and understanding their significant struggles is crucial. With advancements in computer vision, deep neural networks offer potential for recognizing stuttering events through image-based features. In this paper, we extract image features of Wavelet Transformation (WT) and Histograms of Oriented Gradient (HOG) from audio signals. We also generate explainable images using Gradient-weighted Class Activation Mapping (Grad-CAM) as input for our final recognition model–an axial attention-based EfficientNetV2, which is trained on the Kassel State of Fluency Dataset (KSoF) to perform 8 classes recognition. Our experimental results achieved a relative percentage increase in unweighted average recall (UAR) of 4.4% compared to the baseline of ComParE 2022, demonstrating that the axial attention-based EfficientNetV2, combined with the explainable input, has the capability to detect and recognise multiple types of stuttering.

Original languageEnglish
Title of host publicationAdvanced Intelligent Computing Technology and Applications - 19th International Conference, ICIC 2023, Proceedings
EditorsDe-Shuang Huang, Prashan Premaratne, Baohua Jin, Boyang Qu, Kang-Hyun Jo, Abir Hussain
PublisherSpringer Science and Business Media Deutschland GmbH
Pages209-220
Number of pages12
ISBN (Print)9789819947485
DOIs
Publication statusPublished - 2023
Event19th International Conference on Intelligent Computing, ICIC 2023 - Zhengzhou, China
Duration: 10 Aug 202313 Aug 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14088 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Intelligent Computing, ICIC 2023
Country/TerritoryChina
CityZhengzhou
Period10/08/2313/08/23

Keywords

  • Histogram of Oriented Gradient
  • Speech
  • Stuttering Recognition
  • Wavelet Transformation

Fingerprint

Dive into the research topics of 'Explainable Stuttering Recognition Using Axial Attention'. Together they form a unique fingerprint.

Cite this