Text to Image Generation with Conformer-GAN

Zhiyu Deng, Wenxin Yu*, Lu Che, Shiyu Chen, Zhiqiang Zhang, Jun Shang, Peng Chen, Jun Gong

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 2
  • Captures
    • Readers: 1
see details

Abstract

Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

Original languageEnglish
Title of host publicationNeural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
EditorsBiao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages3-14
Number of pages12
ISBN (Print)9789819980727
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, China
Duration: 20 Nov 202323 Nov 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14451 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference30th International Conference on Neural Information Processing, ICONIP 2023
Country/TerritoryChina
CityChangsha
Period20/11/2323/11/23

Keywords

  • Computer Vision
  • Deep Learning
  • Generative Adversarial Networks
  • Text-to-Image Synthesis

Fingerprint

Dive into the research topics of 'Text to Image Generation with Conformer-GAN'. Together they form a unique fingerprint.

Cite this

Deng, Z., Yu, W., Che, L., Chen, S., Zhang, Z., Shang, J., Chen, P., & Gong, J. (2024). Text to Image Generation with Conformer-GAN. In B. Luo, L. Cheng, Z.-G. Wu, H. Li, & C. Li (Eds.), Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings (pp. 3-14). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14451 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8073-4_1