TY - GEN
T1 - Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN
AU - Shang, Jun
AU - Yu, Wenxin
AU - Che, Lu
AU - Zhang, Zhiqiang
AU - Cai, Hongjie
AU - Deng, Zhiyu
AU - Gong, Jun
AU - Chen, Peng
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2024
Y1 - 2024
N2 - In this paper, we propose a novel Equipped with Threshold Matching-Aware Generative Adversarial Network (ETMA-GAN) for text-to-image synthesis. By filtering inaccurate negative samples, the discriminator can more accurately determine whether the generator has generated the images correctly according to the descriptions. In addition, to enhance the discriminative model’s ability to discriminate and capture key semantic information, a word fine-grained supervisor is constructed, which in turn drives the generative model to achieve high-quality image detail synthesis. Numerous experiments and ablation studies on Caltech-UCSD Birds 200 (CUB) and Microsoft Common Objects in Context (MS COCO) datasets demonstrate the effectiveness and superiority of the proposed method over existing methods. In terms of subjective and objective evaluations, the model presented in this study has more advantages than the recently available state-of-the-art methods, especially regarding synthetic images with a higher degree of realism and better conformity to text descriptions.
AB - In this paper, we propose a novel Equipped with Threshold Matching-Aware Generative Adversarial Network (ETMA-GAN) for text-to-image synthesis. By filtering inaccurate negative samples, the discriminator can more accurately determine whether the generator has generated the images correctly according to the descriptions. In addition, to enhance the discriminative model’s ability to discriminate and capture key semantic information, a word fine-grained supervisor is constructed, which in turn drives the generative model to achieve high-quality image detail synthesis. Numerous experiments and ablation studies on Caltech-UCSD Birds 200 (CUB) and Microsoft Common Objects in Context (MS COCO) datasets demonstrate the effectiveness and superiority of the proposed method over existing methods. In terms of subjective and objective evaluations, the model presented in this study has more advantages than the recently available state-of-the-art methods, especially regarding synthetic images with a higher degree of realism and better conformity to text descriptions.
KW - Computer Vision
KW - Generative Adversarial Networks
KW - Matching-Aware
KW - Text-to-Image Synthesis
UR - http://www.scopus.com/inward/record.url?scp=85178576791&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-8148-9_13
DO - 10.1007/978-981-99-8148-9_13
M3 - Conference contribution
AN - SCOPUS:85178576791
SN - 9789819981472
T3 - Communications in Computer and Information Science
SP - 161
EP - 172
BT - Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
A2 - Luo, Biao
A2 - Cheng, Long
A2 - Wu, Zheng-Guang
A2 - Li, Hongyi
A2 - Li, Chaojie
PB - Springer Science and Business Media Deutschland GmbH
T2 - 30th International Conference on Neural Information Processing, ICONIP 2023
Y2 - 20 November 2023 through 23 November 2023
ER -