Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss

Xinmei Su; Xiang Xie; Fengrun Zhang; Chenguang Hu

doi:10.21437/Interspeech.2023-1205

Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss

Xinmei Su, Xiang Xie^*, Fengrun Zhang, Chenguang Hu

^*Corresponding author for this work

School of Information and Electronics

Beijing Institute of Technology

Research output: Contribution to journal › Conference article › peer-review

3 Citations (Scopus)

Abstract

In speaker verification, performance degradation caused by domain mismatch has been a common problem as the test domain lies outside the training distribution. In this paper, we present a novel domain transfer network called Adversarial Diffusion Probabilistic Model (ADPM), to better alleviate this problem. More specifically, ADPM is used to transfer melspectrogram from the source domain into the target domain. To generate the melspectrogram, we propose to regard the diffusion model as the generator and a discriminator is employed for adversarial training. We also explore the contrastive learning objective to retain the context information of source domain. The generated and the original feature maps from the source domain are fed into the ResNet34 network jointly to construct cross-domain speaker verification. We evaluate the proposed techniques on VOiCES dataset, and our best model achieves a relative 8.94% Equal Error Rate (EER) drop compared to the previous adaption methods.

Original language	English
Pages (from-to)	5336-5340
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2023-August
DOIs	https://doi.org/10.21437/Interspeech.2023-1205
Publication status	Published - 2023
Event	24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023

Keywords

contrastive learning
cross-domain
diffusion models
speaker verification

Access to Document

10.21437/Interspeech.2023-1205

Cite this

@article{d66a51472a094b8cbdfda543e1a51071,

title = "Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss",

abstract = "In speaker verification, performance degradation caused by domain mismatch has been a common problem as the test domain lies outside the training distribution. In this paper, we present a novel domain transfer network called Adversarial Diffusion Probabilistic Model (ADPM), to better alleviate this problem. More specifically, ADPM is used to transfer melspectrogram from the source domain into the target domain. To generate the melspectrogram, we propose to regard the diffusion model as the generator and a discriminator is employed for adversarial training. We also explore the contrastive learning objective to retain the context information of source domain. The generated and the original feature maps from the source domain are fed into the ResNet34 network jointly to construct cross-domain speaker verification. We evaluate the proposed techniques on VOiCES dataset, and our best model achieves a relative 8.94% Equal Error Rate (EER) drop compared to the previous adaption methods.",

keywords = "contrastive learning, cross-domain, diffusion models, speaker verification",

author = "Xinmei Su and Xiang Xie and Fengrun Zhang and Chenguang Hu",

note = "Publisher Copyright: {\textcopyright} 2023 International Speech Communication Association. All rights reserved.; 24th International Speech Communication Association, Interspeech 2023 ; Conference date: 20-08-2023 Through 24-08-2023",

year = "2023",

doi = "10.21437/Interspeech.2023-1205",

language = "English",

volume = "2023-August",

pages = "5336--5340",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss. / Su, Xinmei; Xie, Xiang; Zhang, Fengrun et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2023-August, 2023, p. 5336-5340.

Research output: Contribution to journal › Conference article › peer-review

TY - JOUR

T1 - Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss

AU - Su, Xinmei

AU - Xie, Xiang

AU - Zhang, Fengrun

AU - Hu, Chenguang

PY - 2023

Y1 - 2023

N2 - In speaker verification, performance degradation caused by domain mismatch has been a common problem as the test domain lies outside the training distribution. In this paper, we present a novel domain transfer network called Adversarial Diffusion Probabilistic Model (ADPM), to better alleviate this problem. More specifically, ADPM is used to transfer melspectrogram from the source domain into the target domain. To generate the melspectrogram, we propose to regard the diffusion model as the generator and a discriminator is employed for adversarial training. We also explore the contrastive learning objective to retain the context information of source domain. The generated and the original feature maps from the source domain are fed into the ResNet34 network jointly to construct cross-domain speaker verification. We evaluate the proposed techniques on VOiCES dataset, and our best model achieves a relative 8.94% Equal Error Rate (EER) drop compared to the previous adaption methods.

AB - In speaker verification, performance degradation caused by domain mismatch has been a common problem as the test domain lies outside the training distribution. In this paper, we present a novel domain transfer network called Adversarial Diffusion Probabilistic Model (ADPM), to better alleviate this problem. More specifically, ADPM is used to transfer melspectrogram from the source domain into the target domain. To generate the melspectrogram, we propose to regard the diffusion model as the generator and a discriminator is employed for adversarial training. We also explore the contrastive learning objective to retain the context information of source domain. The generated and the original feature maps from the source domain are fed into the ResNet34 network jointly to construct cross-domain speaker verification. We evaluate the proposed techniques on VOiCES dataset, and our best model achieves a relative 8.94% Equal Error Rate (EER) drop compared to the previous adaption methods.

KW - contrastive learning

KW - cross-domain

KW - diffusion models

KW - speaker verification

UR - http://www.scopus.com/inward/record.url?scp=85171576079&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2023-1205

DO - 10.21437/Interspeech.2023-1205

M3 - Conference article

AN - SCOPUS:85171576079

SN - 2308-457X

VL - 2023-August

SP - 5336

EP - 5340

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 24th International Speech Communication Association, Interspeech 2023

Y2 - 20 August 2023 through 24 August 2023

ER -

Adversarial Diffusion Probability Model For Cross-domain Speaker Verification Integrating Contrastive Loss

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this