Multi-layer feature fusion based image style transfer with arbitrary text condition

Yue Yu; Jingshuo Xing; Nengli Li

doi:10.1016/j.image.2024.117243

Multi-layer feature fusion based image style transfer with arbitrary text condition

Yue Yu^*, Jingshuo Xing, Nengli Li

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Style transfer refers to the conversion of images in two different domains. Compared with the style transfer based on the style image, the image style transfer through the text description is more free and applicable to more practical scenarios. However, the image style transfer method under the text condition needs to be trained and optimized for different text and image inputs each time, resulting in limited style transfer efficiency. Therefore, this paper proposes a multi-layer feature fusion based style transfer method (MlFFST) with arbitrary text condition. To address the problems of distortion and missing semantic content, we also introduce a multi-layer attention normalization module. The experimental results show that the method in this paper can generate stylized results with high quality, good effect and high stability for images and videos. And this method can meet real-time requirements to generate more artistic and aesthetic images and videos.

源语言	英语
文章编号	117243
期刊	Signal Processing: Image Communication
卷	132
DOI	https://doi.org/10.1016/j.image.2024.117243
出版状态	已出版 - 3月 2025

访问文件

10.1016/j.image.2024.117243

其它文件与链接

链接到 Scopus 的出版物

引用此

Yu, Y., Xing, J., & Li, N. (2025). Multi-layer feature fusion based image style transfer with arbitrary text condition. Signal Processing: Image Communication, 132, 文章 117243. https://doi.org/10.1016/j.image.2024.117243

@article{5be27252b21b4b68b3bfbb01733e7f59,

title = "Multi-layer feature fusion based image style transfer with arbitrary text condition",

abstract = "Style transfer refers to the conversion of images in two different domains. Compared with the style transfer based on the style image, the image style transfer through the text description is more free and applicable to more practical scenarios. However, the image style transfer method under the text condition needs to be trained and optimized for different text and image inputs each time, resulting in limited style transfer efficiency. Therefore, this paper proposes a multi-layer feature fusion based style transfer method (MlFFST) with arbitrary text condition. To address the problems of distortion and missing semantic content, we also introduce a multi-layer attention normalization module. The experimental results show that the method in this paper can generate stylized results with high quality, good effect and high stability for images and videos. And this method can meet real-time requirements to generate more artistic and aesthetic images and videos.",

keywords = "Attention mechanism, CLIP, Style transfer, Text",

author = "Yue Yu and Jingshuo Xing and Nengli Li",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2025",

month = mar,

doi = "10.1016/j.image.2024.117243",

language = "English",

volume = "132",

journal = "Signal Processing: Image Communication",

issn = "0923-5965",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Multi-layer feature fusion based image style transfer with arbitrary text condition

AU - Yu, Yue

AU - Xing, Jingshuo

AU - Li, Nengli

PY - 2025/3

Y1 - 2025/3

N2 - Style transfer refers to the conversion of images in two different domains. Compared with the style transfer based on the style image, the image style transfer through the text description is more free and applicable to more practical scenarios. However, the image style transfer method under the text condition needs to be trained and optimized for different text and image inputs each time, resulting in limited style transfer efficiency. Therefore, this paper proposes a multi-layer feature fusion based style transfer method (MlFFST) with arbitrary text condition. To address the problems of distortion and missing semantic content, we also introduce a multi-layer attention normalization module. The experimental results show that the method in this paper can generate stylized results with high quality, good effect and high stability for images and videos. And this method can meet real-time requirements to generate more artistic and aesthetic images and videos.

AB - Style transfer refers to the conversion of images in two different domains. Compared with the style transfer based on the style image, the image style transfer through the text description is more free and applicable to more practical scenarios. However, the image style transfer method under the text condition needs to be trained and optimized for different text and image inputs each time, resulting in limited style transfer efficiency. Therefore, this paper proposes a multi-layer feature fusion based style transfer method (MlFFST) with arbitrary text condition. To address the problems of distortion and missing semantic content, we also introduce a multi-layer attention normalization module. The experimental results show that the method in this paper can generate stylized results with high quality, good effect and high stability for images and videos. And this method can meet real-time requirements to generate more artistic and aesthetic images and videos.

KW - Attention mechanism

KW - CLIP

KW - Style transfer

KW - Text

UR - http://www.scopus.com/inward/record.url?scp=85211043204&partnerID=8YFLogxK

U2 - 10.1016/j.image.2024.117243

DO - 10.1016/j.image.2024.117243

M3 - Article

AN - SCOPUS:85211043204

SN - 0923-5965

VL - 132

JO - Signal Processing: Image Communication

JF - Signal Processing: Image Communication

M1 - 117243

ER -

Multi-layer feature fusion based image style transfer with arbitrary text condition

摘要

访问文件

其它文件与链接

指纹

引用此