Foreground and background separated image style transfer with a single text condition

Yue Yu; Jianming Wang; Nengli Li

doi:10.1016/j.imavis.2024.104956

Foreground and background separated image style transfer with a single text condition

Yue Yu^*, Jianming Wang, Nengli Li

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.

源语言	英语
文章编号	104956
期刊	Image and Vision Computing
卷	143
DOI	https://doi.org/10.1016/j.imavis.2024.104956
出版状态	已出版 - 3月 2024

访问文件

10.1016/j.imavis.2024.104956

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6b4088ecc3334d3cabe9df58091aa271,

title = "Foreground and background separated image style transfer with a single text condition",

abstract = "Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.",

keywords = "CLIP, Salient object detection, Style transfer, Text",

author = "Yue Yu and Jianming Wang and Nengli Li",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2024",

month = mar,

doi = "10.1016/j.imavis.2024.104956",

language = "English",

volume = "143",

journal = "Image and Vision Computing",

issn = "0262-8856",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Foreground and background separated image style transfer with a single text condition

AU - Yu, Yue

AU - Wang, Jianming

AU - Li, Nengli

PY - 2024/3

Y1 - 2024/3

N2 - Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.

AB - Traditional image-based style transfer requires additional reference style images, making it less user-friendly. Text-based methods are more convenient but suffer from issues like slow generation, unclear content, and poor quality. In this work, we propose a new style transfer method SA2-CS (means Semantic-Aware and Salient Attention CLIPStyler), which is based on the Comparative Language Image Pretraining (CLIP) model and a salient object detection network. Masks obtained from the salient object detection network are utilized to guide the style transfer process, and various strategies are employed to optimize according to different masks. Adequate experiments with diverse content images and style text descriptions were conducted, demonstrating our method's advantages: the network is easily trainable and converges rapidly; it achieves stable, superior generation results compared to other methods. Our approach addresses over-stylization issues in the foreground, enhances foreground-background contrast, and enables precise control over style transfer in various semantic regions.

KW - CLIP

KW - Salient object detection

KW - Style transfer

KW - Text

UR - http://www.scopus.com/inward/record.url?scp=85185838291&partnerID=8YFLogxK

U2 - 10.1016/j.imavis.2024.104956

DO - 10.1016/j.imavis.2024.104956

M3 - Article

AN - SCOPUS:85185838291

SN - 0262-8856

VL - 143

JO - Image and Vision Computing

JF - Image and Vision Computing

M1 - 104956

ER -

Foreground and background separated image style transfer with a single text condition

摘要

访问文件

其它文件与链接

指纹

引用此