Multi-layer feature fusion based image style transfer with arbitrary text condition

Yue Yu*, Jingshuo Xing, Nengli Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Style transfer refers to the conversion of images in two different domains. Compared with the style transfer based on the style image, the image style transfer through the text description is more free and applicable to more practical scenarios. However, the image style transfer method under the text condition needs to be trained and optimized for different text and image inputs each time, resulting in limited style transfer efficiency. Therefore, this paper proposes a multi-layer feature fusion based style transfer method (MlFFST) with arbitrary text condition. To address the problems of distortion and missing semantic content, we also introduce a multi-layer attention normalization module. The experimental results show that the method in this paper can generate stylized results with high quality, good effect and high stability for images and videos. And this method can meet real-time requirements to generate more artistic and aesthetic images and videos.

Original languageEnglish
Article number117243
JournalSignal Processing: Image Communication
Volume132
DOIs
Publication statusPublished - Mar 2025

Keywords

  • Attention mechanism
  • CLIP
  • Style transfer
  • Text

Fingerprint

Dive into the research topics of 'Multi-layer feature fusion based image style transfer with arbitrary text condition'. Together they form a unique fingerprint.

Cite this

Yu, Y., Xing, J., & Li, N. (2025). Multi-layer feature fusion based image style transfer with arbitrary text condition. Signal Processing: Image Communication, 132, Article 117243. https://doi.org/10.1016/j.image.2024.117243