Breaking through data scarcity: A novel diffusion model approach for snoring sound augmentation and classification

  • Tianrui Jia
  • , Haojie Zhang
  • , Hanhan Wu
  • , Qiyang Sun
  • , Xin Jing
  • , Boyang Meng
  • , Lin Shen
  • , Liang Wang
  • , Kun Qian*
  • , Ye Zhang*
  • , Bin Hu
  • , Tanja Schultz
  • , Björn W. Schuller
  • , Yoshiharu Yamamoto
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Snoring can stem from various upper regions of the upper airways – the excitation location is closely linked to the unique acoustic characteristics of snore sounds, playing a vital role in sleep monitoring. The Munich-Passau Snore Sound Corpus (MPSSC) is the largest database for snoring-based auxiliary diagnosis, offering valuable data for sleep disorder research and diagnosis. However, in MPSSC, there are issues such as the small total number of samples and the uneven sample distribution. Some rare diseases have only a few case samples, failing to meet the need for sufficient learning data. To address these issues, we propose an end-to-end method for high-quality snoring audio generation for data augmentation. This method includes a Rectified-Flow-based 1D-signal diffusion model that enhances data across all classes, combined with an audio-based single diffusion model to enhance rare classes. Under our data augmentation framework, higher specificity, sensitivity, and accuracy are achieved in Automatic Snoring Sound Classification (ASSC). Also, we focus on the explicitness of classification strategies, aiming to prove the enhanced data's high quality and applicability to downstream tasks. Our work provides comprehensive support for ASSC, enhancing sleep disorder diagnosis assistance offering new ideas for database research under scarce medical data conditions.

Original languageEnglish
Article number109449
JournalBiomedical Signal Processing and Control
Volume116
DOIs
Publication statusPublished - 1 May 2026

Keywords

  • Diffusion model
  • Flow matching
  • Rectified flow
  • Single data diffusion
  • Snore sound generation

Fingerprint

Dive into the research topics of 'Breaking through data scarcity: A novel diffusion model approach for snoring sound augmentation and classification'. Together they form a unique fingerprint.

Cite this