Text to Image Generation with Conformer-GAN

Zhiyu Deng, Wenxin Yu*, Lu Che, Shiyu Chen, Zhiqiang Zhang, Jun Shang, Peng Chen, Jun Gong

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

源语言英语
主期刊名Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
编辑Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
出版商Springer Science and Business Media Deutschland GmbH
3-14
页数12
ISBN(印刷版)9789819980727
DOI
出版状态已出版 - 2024
已对外发布
活动30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, 中国
期限: 20 11月 202323 11月 2023

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
14451 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议30th International Conference on Neural Information Processing, ICONIP 2023
国家/地区中国
Changsha
时期20/11/2323/11/23

指纹

探究 'Text to Image Generation with Conformer-GAN' 的科研主题。它们共同构成独一无二的指纹。

引用此