Learning higher representations from pre-trained deep models with data augmentation for the COMPARE 2020 challenge mask task

Tomoya Koike, Kun Qian*, Björn W. Schuller, Yoshiharu Yamamoto

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Citations (Scopus)

Abstract

Human hand-crafted features are always regarded as expensive, time-consuming, and difficult in almost all of the machine-learning-related tasks. First, those well-designed features extremely rely on human expert domain knowledge, which may restrain the collaboration work across fields. Second, the features extracted in such a brute-force scenario may not be easy to be transferred to another task, which means a series of new features should be designed. To this end, we introduce a method based on a transfer learning strategy combined with data augmentation techniques for the COMPARE 2020 Challenge Mask Sub-Challenge. Unlike the previous studies mainly based on pre-trained models by image data, we use a pre-trained model based on large scale audio data, i. e., AudioSet. In addition, the SpecAugment and mixup methods are used to improve the generalisation of the deep models. Experimental results demonstrate that the best-proposed model can significantly (p <.001, by one-tailed z-test) improve the unweighted average recall (UAR) from 71.8 % (baseline) to 76.2 % on the test set. Finally, the best result, i. e., 77.5 % of the UAR on the test set, is achieved by a late fusion of the two best proposed models and the best single model in the baseline.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages2047-2051
Number of pages5
ISBN (Print)9781713820697
DOIs
Publication statusPublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • Computational Paralinguistics
  • Data Augmentation
  • Deep Learning
  • Speech under Mask
  • Transfer Learning

Fingerprint

Dive into the research topics of 'Learning higher representations from pre-trained deep models with data augmentation for the COMPARE 2020 challenge mask task'. Together they form a unique fingerprint.

Cite this