TY - JOUR
T1 - Consistent Image Layout Editing With Diffusion Models
AU - Xia, Tao
AU - Zhang, Yudi
AU - Liu, Ting
AU - Zhang, Lei
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle with editing the layout of real-world images. Although a few works have been developed to address this issue, they either fail to adjust the image layout effectively or encounter challenges in preserving the visual appearance of objects after layout adjustment. To bridge this gap, this paper proposes a novel image layout editing method that not only re-arranges a real-world image to a specified layout, but also ensures that the visual appearance of the objects remains consistent with their original state prior to editing. Concretely, a Multi-Concept Learning scheme is developed to learn the concepts of different objects from a single image, which can be seen as a novel inversion scheme tailored for image layout editing. Then, we leverage the semantic consistency within intermediate features of diffusion models to project the appearance information of objects to the target regions to improve the fidelity of objects after editing. Additionally, a novel initialization noise design is adopted to facilitate the convergence and success rate of re-arranging the layout. The phenomenon of concept entanglement is also analyzed, and resolved by a novel asynchronous editing strategy. Extensive experimental results demonstrate that the proposed method outperforms existing methods in both layout alignment and visual consistency for the task of image layout editing.
AB - Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle with editing the layout of real-world images. Although a few works have been developed to address this issue, they either fail to adjust the image layout effectively or encounter challenges in preserving the visual appearance of objects after layout adjustment. To bridge this gap, this paper proposes a novel image layout editing method that not only re-arranges a real-world image to a specified layout, but also ensures that the visual appearance of the objects remains consistent with their original state prior to editing. Concretely, a Multi-Concept Learning scheme is developed to learn the concepts of different objects from a single image, which can be seen as a novel inversion scheme tailored for image layout editing. Then, we leverage the semantic consistency within intermediate features of diffusion models to project the appearance information of objects to the target regions to improve the fidelity of objects after editing. Additionally, a novel initialization noise design is adopted to facilitate the convergence and success rate of re-arranging the layout. The phenomenon of concept entanglement is also analyzed, and resolved by a novel asynchronous editing strategy. Extensive experimental results demonstrate that the proposed method outperforms existing methods in both layout alignment and visual consistency for the task of image layout editing.
KW - Image layout editing
KW - diffusion models
KW - visual consistency
UR - https://www.scopus.com/pages/publications/105020400670
U2 - 10.1109/TIP.2025.3623869
DO - 10.1109/TIP.2025.3623869
M3 - Article
C2 - 41150226
AN - SCOPUS:105020400670
SN - 1057-7149
VL - 34
SP - 6978
EP - 6992
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -