Abstract
Language-guided fashion image editing is challenging, as fashion image editing is local and requires high precision, while natural language cannot provide precise visual information for guidance. In this paper, we propose LucIE, a novel unsupervised language-guided local image editing method for fashion images. LucIE adopts and modifies recent text-to-image synthesis network, DF-GAN, as its backbone. However, the synthesis backbone often changes the global structure of the input image, making local image editing impractical. To increase structural consistency between input and edited images, we propose Content-Preserving Fusion Module (CPFM). Different from existing fusion modules, CPFM prevents iterative refinement on visual feature maps and accumulates additive modifications on RGB maps. LucIE achieves local image editing explicitly with language-guided image segmentation and mask-guided image blending while only using image and text pairs. Results on the DeepFashion dataset shows that LucIE achieves state-of-the-art results. Compared with previous methods, images generated by LucIE also exhibit fewer artifacts. We provide visualizations and perform ablation studies to validate LucIE and the CPFM. We also demonstrate and analyze limitations of LucIE, to provide a better understanding of LucIE.
Original language | English |
---|---|
Pages (from-to) | 179-194 |
Number of pages | 16 |
Journal | Computational Visual Media |
Volume | 11 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2025 |
Externally published | Yes |
Keywords
- content preservation
- deep learning
- fashion images
- language-guided image editing
- local image editing