Foreground and background separated image style transfer with a single text condition

Yue Yu*, Jianming Wang, Nengli Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.

Original languageEnglish
Article number104956
JournalImage and Vision Computing
Volume143
DOIs
Publication statusPublished - Mar 2024

Keywords

  • CLIP
  • Salient object detection
  • Style transfer
  • Text

Fingerprint

Dive into the research topics of 'Foreground and background separated image style transfer with a single text condition'. Together they form a unique fingerprint.

Cite this