Text to Image Generation with Conformer-GAN

Zhiyu Deng; Wenxin Yu; Lu Che; Shiyu Chen; Zhiqiang Zhang; Jun Shang; Peng Chen; Jun Gong

doi:10.1007/978-981-99-8073-4_1

Text to Image Generation with Conformer-GAN

Zhiyu Deng, Wenxin Yu^*, Lu Che, Shiyu Chen, Zhiqiang Zhang, Jun Shang, Peng Chen, Jun Gong

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

源语言	英语
主期刊名	Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
编辑	Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
出版商	Springer Science and Business Media Deutschland GmbH
页	3-14
页数	12
ISBN（印刷版）	9789819980727
DOI	https://doi.org/10.1007/978-981-99-8073-4_1
出版状态	已出版 - 2024
已对外发布	是
活动	30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, 中国期限: 20 11月 2023 → 23 11月 2023

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	14451 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	30th International Conference on Neural Information Processing, ICONIP 2023
国家/地区	中国
市	Changsha
时期	20/11/23 → 23/11/23

访问文件

10.1007/978-981-99-8073-4_1

其它文件与链接

链接到 Scopus 的出版物

引用此

Deng, Z., Yu, W., Che, L., Chen, S., Zhang, Z., Shang, J., Chen, P., & Gong, J. (2024). Text to Image Generation with Conformer-GAN. 在 B. Luo, L. Cheng, Z.-G. Wu, H. Li, & C. Li (编辑), Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings (页码 3-14). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14451 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-8073-4_1

Deng, Zhiyu ; Yu, Wenxin ; Che, Lu 等. / Text to Image Generation with Conformer-GAN. Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings. 编辑 / Biao Luo ; Long Cheng ; Zheng-Guang Wu ; Hongyi Li ; Chaojie Li. Springer Science and Business Media Deutschland GmbH, 2024. 页码 3-14 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{7d9849de3cc242cab029619a7fc79154,

title = "Text to Image Generation with Conformer-GAN",

abstract = "Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.",

keywords = "Computer Vision, Deep Learning, Generative Adversarial Networks, Text-to-Image Synthesis",

author = "Zhiyu Deng and Wenxin Yu and Lu Che and Shiyu Chen and Zhiqiang Zhang and Jun Shang and Peng Chen and Jun Gong",

note = "Publisher Copyright: {\textcopyright} 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.; 30th International Conference on Neural Information Processing, ICONIP 2023 ; Conference date: 20-11-2023 Through 23-11-2023",

year = "2024",

doi = "10.1007/978-981-99-8073-4_1",

language = "English",

isbn = "9789819980727",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "3--14",

editor = "Biao Luo and Long Cheng and Zheng-Guang Wu and Hongyi Li and Chaojie Li",

booktitle = "Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings",

address = "Germany",

}

Deng, Z, Yu, W, Che, L, Chen, S, Zhang, Z, Shang, J, Chen, P & Gong, J 2024, Text to Image Generation with Conformer-GAN. 在 B Luo, L Cheng, Z-G Wu, H Li & C Li (编辑), Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 14451 LNCS, Springer Science and Business Media Deutschland GmbH, 页码 3-14, 30th International Conference on Neural Information Processing, ICONIP 2023, Changsha, 中国, 20/11/23. https://doi.org/10.1007/978-981-99-8073-4_1

Text to Image Generation with Conformer-GAN. / Deng, Zhiyu; Yu, Wenxin; Che, Lu 等.
Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings. 编辑 / Biao Luo; Long Cheng; Zheng-Guang Wu; Hongyi Li; Chaojie Li. Springer Science and Business Media Deutschland GmbH, 2024. 页码 3-14 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 14451 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Text to Image Generation with Conformer-GAN

AU - Deng, Zhiyu

AU - Yu, Wenxin

AU - Che, Lu

AU - Chen, Shiyu

AU - Zhang, Zhiqiang

AU - Shang, Jun

AU - Chen, Peng

AU - Gong, Jun

PY - 2024

Y1 - 2024

N2 - Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

AB - Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network (CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 × 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.

KW - Computer Vision

KW - Deep Learning

KW - Generative Adversarial Networks

KW - Text-to-Image Synthesis

UR - http://www.scopus.com/inward/record.url?scp=85178613144&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-8073-4_1

DO - 10.1007/978-981-99-8073-4_1

M3 - Conference contribution

AN - SCOPUS:85178613144

SN - 9789819980727

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 14

BT - Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings

A2 - Luo, Biao

A2 - Cheng, Long

A2 - Wu, Zheng-Guang

A2 - Li, Hongyi

A2 - Li, Chaojie

PB - Springer Science and Business Media Deutschland GmbH

T2 - 30th International Conference on Neural Information Processing, ICONIP 2023

Y2 - 20 November 2023 through 23 November 2023

ER -

Deng Z, Yu W, Che L, Chen S, Zhang Z, Shang J 等. Text to Image Generation with Conformer-GAN. 在 Luo B, Cheng L, Wu ZG, Li H, Li C, 编辑, Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. 页码 3-14. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-981-99-8073-4_1

Text to Image Generation with Conformer-GAN

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此