PFGM++ Combined with Stochastic Regeneration for Speech Enhancement

Xiao Cao; Shenghui Zhao

doi:10.1109/ICSIP61881.2024.10671434

PFGM++ Combined with Stochastic Regeneration for Speech Enhancement

Xiao Cao, Shenghui Zhao^*

^*Corresponding author for this work

School of Information and Electronics

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Diffusion models have been applied in speech enhancement due to its capability to learn complex data distributions. However, the extended Poisson flow generative model (PFGM++) outperforms the diffusion models in terms of robustness. In this work, we introduce PFGM++ to speech enhancement, and SR-PFGM++, which samples using ordinary differential equation (ODE), is proposed by combining the stochastic regeneration model (StoRM) with PFGM++. The testing results on the VoiceBank-DEMAND dataset show that SR-PFGM++ achieves a higher performance with fewer sampling steps compared with StoRM. We also performed a mismatch test on the TIMIT+NOISE92 dataset and the results show the strong generalization capability of SR-PFGM++.

Original language	English
Title of host publication	2024 9th International Conference on Signal and Image Processing, ICSIP 2024
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	267-271
Number of pages	5
ISBN (Electronic)	9798350350920
DOIs	https://doi.org/10.1109/ICSIP61881.2024.10671434
Publication status	Published - 2024
Event	9th International Conference on Signal and Image Processing, ICSIP 2024 - Hybrid, Nanjing, China Duration: 12 Jul 2024 → 14 Jul 2024

Publication series

Name	2024 9th International Conference on Signal and Image Processing, ICSIP 2024

Conference

Conference	9th International Conference on Signal and Image Processing, ICSIP 2024
Country/Territory	China
City	Hybrid, Nanjing
Period	12/07/24 → 14/07/24

Keywords

PFGM++
score-based generative model
speech enhancement
stochastic regeneration

Access to Document

10.1109/ICSIP61881.2024.10671434

Cite this

Cao, X., & Zhao, S. (2024). PFGM++ Combined with Stochastic Regeneration for Speech Enhancement. In 2024 9th International Conference on Signal and Image Processing, ICSIP 2024 (pp. 267-271). (2024 9th International Conference on Signal and Image Processing, ICSIP 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSIP61881.2024.10671434

@inproceedings{5f1a385c2edb458f9b0f0f75fd17f901,

title = "PFGM++ Combined with Stochastic Regeneration for Speech Enhancement",

abstract = "Diffusion models have been applied in speech enhancement due to its capability to learn complex data distributions. However, the extended Poisson flow generative model (PFGM++) outperforms the diffusion models in terms of robustness. In this work, we introduce PFGM++ to speech enhancement, and SR-PFGM++, which samples using ordinary differential equation (ODE), is proposed by combining the stochastic regeneration model (StoRM) with PFGM++. The testing results on the VoiceBank-DEMAND dataset show that SR-PFGM++ achieves a higher performance with fewer sampling steps compared with StoRM. We also performed a mismatch test on the TIMIT+NOISE92 dataset and the results show the strong generalization capability of SR-PFGM++.",

keywords = "PFGM++, score-based generative model, speech enhancement, stochastic regeneration",

author = "Xiao Cao and Shenghui Zhao",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 9th International Conference on Signal and Image Processing, ICSIP 2024 ; Conference date: 12-07-2024 Through 14-07-2024",

year = "2024",

doi = "10.1109/ICSIP61881.2024.10671434",

language = "English",

series = "2024 9th International Conference on Signal and Image Processing, ICSIP 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "267--271",

booktitle = "2024 9th International Conference on Signal and Image Processing, ICSIP 2024",

address = "United States",

}

Cao, X & Zhao, S 2024, PFGM++ Combined with Stochastic Regeneration for Speech Enhancement. in 2024 9th International Conference on Signal and Image Processing, ICSIP 2024. 2024 9th International Conference on Signal and Image Processing, ICSIP 2024, Institute of Electrical and Electronics Engineers Inc., pp. 267-271, 9th International Conference on Signal and Image Processing, ICSIP 2024, Hybrid, Nanjing, China, 12/07/24. https://doi.org/10.1109/ICSIP61881.2024.10671434

PFGM++ Combined with Stochastic Regeneration for Speech Enhancement. / Cao, Xiao; Zhao, Shenghui.
2024 9th International Conference on Signal and Image Processing, ICSIP 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 267-271 (2024 9th International Conference on Signal and Image Processing, ICSIP 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - PFGM++ Combined with Stochastic Regeneration for Speech Enhancement

AU - Cao, Xiao

AU - Zhao, Shenghui

PY - 2024

Y1 - 2024

N2 - Diffusion models have been applied in speech enhancement due to its capability to learn complex data distributions. However, the extended Poisson flow generative model (PFGM++) outperforms the diffusion models in terms of robustness. In this work, we introduce PFGM++ to speech enhancement, and SR-PFGM++, which samples using ordinary differential equation (ODE), is proposed by combining the stochastic regeneration model (StoRM) with PFGM++. The testing results on the VoiceBank-DEMAND dataset show that SR-PFGM++ achieves a higher performance with fewer sampling steps compared with StoRM. We also performed a mismatch test on the TIMIT+NOISE92 dataset and the results show the strong generalization capability of SR-PFGM++.

AB - Diffusion models have been applied in speech enhancement due to its capability to learn complex data distributions. However, the extended Poisson flow generative model (PFGM++) outperforms the diffusion models in terms of robustness. In this work, we introduce PFGM++ to speech enhancement, and SR-PFGM++, which samples using ordinary differential equation (ODE), is proposed by combining the stochastic regeneration model (StoRM) with PFGM++. The testing results on the VoiceBank-DEMAND dataset show that SR-PFGM++ achieves a higher performance with fewer sampling steps compared with StoRM. We also performed a mismatch test on the TIMIT+NOISE92 dataset and the results show the strong generalization capability of SR-PFGM++.

KW - PFGM++

KW - score-based generative model

KW - speech enhancement

KW - stochastic regeneration

UR - http://www.scopus.com/inward/record.url?scp=85206098775&partnerID=8YFLogxK

U2 - 10.1109/ICSIP61881.2024.10671434

DO - 10.1109/ICSIP61881.2024.10671434

M3 - Conference contribution

AN - SCOPUS:85206098775

T3 - 2024 9th International Conference on Signal and Image Processing, ICSIP 2024

SP - 267

EP - 271

BT - 2024 9th International Conference on Signal and Image Processing, ICSIP 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 9th International Conference on Signal and Image Processing, ICSIP 2024

Y2 - 12 July 2024 through 14 July 2024

ER -

PFGM++ Combined with Stochastic Regeneration for Speech Enhancement

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this