Consistent Image Layout Editing With Diffusion Models

  • Tao Xia
  • , Yudi Zhang
  • , Ting Liu
  • , Lei Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle with editing the layout of real-world images. Although a few works have been developed to address this issue, they either fail to adjust the image layout effectively or encounter challenges in preserving the visual appearance of objects after layout adjustment. To bridge this gap, this paper proposes a novel image layout editing method that not only re-arranges a real-world image to a specified layout, but also ensures that the visual appearance of the objects remains consistent with their original state prior to editing. Concretely, a Multi-Concept Learning scheme is developed to learn the concepts of different objects from a single image, which can be seen as a novel inversion scheme tailored for image layout editing. Then, we leverage the semantic consistency within intermediate features of diffusion models to project the appearance information of objects to the target regions to improve the fidelity of objects after editing. Additionally, a novel initialization noise design is adopted to facilitate the convergence and success rate of re-arranging the layout. The phenomenon of concept entanglement is also analyzed, and resolved by a novel asynchronous editing strategy. Extensive experimental results demonstrate that the proposed method outperforms existing methods in both layout alignment and visual consistency for the task of image layout editing.

Original languageEnglish
Pages (from-to)6978-6992
Number of pages15
JournalIEEE Transactions on Image Processing
Volume34
DOIs
Publication statusPublished - 2025

Keywords

  • Image layout editing
  • diffusion models
  • visual consistency

Fingerprint

Dive into the research topics of 'Consistent Image Layout Editing With Diffusion Models'. Together they form a unique fingerprint.

Cite this