Are You Speaking with a Mask? An Investigation on Attention Based Deep Temporal Convolutional Neural Networks for Mask Detection Task

Yu Qiao, Kun Qian, Ziping Zhao*, Xiaojing Zhao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

When writing this article, COVID-19 as a global epidemic, has affected more than 200 countries and territories globally and lead to more than 694,000 deaths. Wearing a mask is one of most convenient, cheap, and efficient precautions. Moreover, guaranteeing a quality of the speech under the condition of wearing a mask is crucial in real-world telecommunication technologies. To this line, the goal of the ComParE 2020 Mask condition recognition of speakers subchallenge is to recognize the states of speakers with or without facial masks worn. In this work, we present three modeling methods under the deep neural network framework, namely Convolutional Recurrent Neural Network(CRNN), Convolutional Temporal Convolutional Network(CTCNs) and CTCNs combined with utterance level features, respectively. Furthermore, we use cycle mode to fill the samples to further enhance the system performance. In the CTCNs model, we tried different network depths. Finally, the experimental results demonstrate the effectiveness of the CTCNs network structure, which can reach an unweighted average recall (UAR) at 66.4% on the development set. This is higher than the result of baseline, which is 64.4% in S2SAE+SVM nerwork(a significance level at p< 0.001 by one-tailed z-test). It demonstrates the good performance of our proposed network.

Original languageEnglish
Title of host publicationProceedings of the 8th Conference on Sound and Music Technology - Selected Papers from CSMT
EditorsXi Shao, Kun Qian, Li Zhou, Xin Wang, Ziping Zhao
PublisherSpringer Science and Business Media Deutschland GmbH
Pages163-174
Number of pages12
ISBN (Print)9789811616488
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event8th Conference on Sound and Music Technology, CSMT 2020 - Taiyuan, China
Duration: 5 Nov 20208 Nov 2020

Publication series

NameLecture Notes in Electrical Engineering
Volume761 LNEE
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference8th Conference on Sound and Music Technology, CSMT 2020
Country/TerritoryChina
CityTaiyuan
Period5/11/208/11/20

Keywords

  • Computational paralinguistics
  • Deep learning framework
  • Mask condition recognition
  • Speech recognition

Fingerprint

Dive into the research topics of 'Are You Speaking with a Mask? An Investigation on Attention Based Deep Temporal Convolutional Neural Networks for Mask Detection Task'. Together they form a unique fingerprint.

Cite this