TY - JOUR
T1 - Foreground and background separated image style transfer with a single text condition
AU - Yu, Yue
AU - Wang, Jianming
AU - Li, Nengli
N1 - Publisher Copyright:
© 2023
PY - 2024/3
Y1 - 2024/3
N2 - Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.
AB - Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.
KW - CLIP
KW - Salient object detection
KW - Style transfer
KW - Text
UR - http://www.scopus.com/inward/record.url?scp=85185838291&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2024.104956
DO - 10.1016/j.imavis.2024.104956
M3 - Article
AN - SCOPUS:85185838291
SN - 0262-8856
VL - 143
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 104956
ER -