Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss

Xinmei Su, Xiang Xie*, Fengrun Zhang, Chenguang Hu

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

3 Citations (Scopus)

Abstract

In speaker verification, performance degradation caused by domain mismatch has been a common problem as the test domain lies outside the training distribution. In this paper, we present a novel domain transfer network called Adversarial Diffusion Probabilistic Model (ADPM), to better alleviate this problem. More specifically, ADPM is used to transfer melspectrogram from the source domain into the target domain. To generate the melspectrogram, we propose to regard the diffusion model as the generator and a discriminator is employed for adversarial training. We also explore the contrastive learning objective to retain the context information of source domain. The generated and the original feature maps from the source domain are fed into the ResNet34 network jointly to construct cross-domain speaker verification. We evaluate the proposed techniques on VOiCES dataset, and our best model achieves a relative 8.94% Equal Error Rate (EER) drop compared to the previous adaption methods.

Original languageEnglish
Pages (from-to)5336-5340
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
Publication statusPublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • contrastive learning
  • cross-domain
  • diffusion models
  • speaker verification

Fingerprint

Dive into the research topics of 'Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss'. Together they form a unique fingerprint.

Cite this