TY - GEN
T1 - SR-PFGM++ Based Consistency Model for Speech Enhancement
AU - Cao, Xiao
AU - Zhao, Shenghui
AU - Hu, Yajing
AU - Wang, Jing
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Diffusion models and the extended Poisson flow generative model (PFGM++) have been applied to speech enhancement. They are sampled via stochastic differential equation (SDE) or ordinary differential equation (ODE), but usually require a large number of sampling steps. Hence, we introduce the consistency models, which allow for high-quality one-step generation with non-adversarial training. Specifically, based on our previous work, SR-PFGM++ (PFGM++ combined with stochastic regeneration) is distilled to train consistency model, resulting in the proposed Consistency Model for speech enhancement. Test results on the VoiceBank-DEMAND dataset show that the proposed model significantly reduces the inference time relative to SR-PFGM++ while maintaining comparable performance. Besides, mismatch test results on the TIMIT+NOISE92 dataset demonstrate the generalization ability of the proposed model.
AB - Diffusion models and the extended Poisson flow generative model (PFGM++) have been applied to speech enhancement. They are sampled via stochastic differential equation (SDE) or ordinary differential equation (ODE), but usually require a large number of sampling steps. Hence, we introduce the consistency models, which allow for high-quality one-step generation with non-adversarial training. Specifically, based on our previous work, SR-PFGM++ (PFGM++ combined with stochastic regeneration) is distilled to train consistency model, resulting in the proposed Consistency Model for speech enhancement. Test results on the VoiceBank-DEMAND dataset show that the proposed model significantly reduces the inference time relative to SR-PFGM++ while maintaining comparable performance. Besides, mismatch test results on the TIMIT+NOISE92 dataset demonstrate the generalization ability of the proposed model.
KW - consistency distillation
KW - consistency models
KW - PFGM++
KW - speech enhancement
KW - stochastic regeneration
UR - https://www.scopus.com/pages/publications/86000001140
U2 - 10.1109/ICSIDP62679.2024.10869119
DO - 10.1109/ICSIDP62679.2024.10869119
M3 - Conference contribution
AN - SCOPUS:86000001140
T3 - IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
BT - IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2024
Y2 - 22 November 2024 through 24 November 2024
ER -