Prompt-guided Precise Audio Editing with Diffusion Models

Manjie Xu, Chenxing Li*, Duzhen Zhang, Dan Su, Wei Liang*, Dong Yu*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as Prompt-guided Precise Audio Editing (PPAE), which serves as a general module for diffusion models and enables precise audio editing. The editing is based on the input textual prompt only and is entirely training-free. We exploit the cross-attention maps of diffusion models to facilitate accurate local editing and employ a hierarchical local-global pipeline to ensure a smoother editing process. Experimental results highlight the effectiveness of our method in various editing tasks.

Original languageEnglish
Pages (from-to)55126-55143
Number of pages18
JournalProceedings of Machine Learning Research
Volume235
Publication statusPublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

Fingerprint

Dive into the research topics of 'Prompt-guided Precise Audio Editing with Diffusion Models'. Together they form a unique fingerprint.

Cite this