TY - GEN
T1 - Joint Embedding based Text-To-Image Synthesis
AU - Wang, Menglan
AU - Yu, Yue
AU - Li, Benyuan
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Learning joint embedding between image and text is significant for text-To-image synthesis as it bridges the semantic gap between image and text. Most existing text-To-image generation methods depend on the quality of the text embedding. If the text features are not be extracted well, it is difficult for subsequent processes to generate satisfactory images. However, these methods are disturbed by the text expression form in the process of extracting text features, resulting in the ideal text features cannot be generated well. In this paper, we propose a new text encoder that learns joint embedding to capture semantic information shared by the real images and the input text, and eliminates the interference of textual expression forms. The main difference with existing works is that for different texts describing the same image, although their expressions are different, because they contain the same semantic information, the proposed text encoder extracts the similar semantic features from these different texts. Meanwhile, a special auxiliary classifier for discriminator is adopted to retain low-level features to generate fine-detailed images. We evaluate this work on the Caltech-UCSD Birds 200 (CUB) and the Oxford-102 flower dataset, experiments show that our work has better performance than the state-of-The-Art works.
AB - Learning joint embedding between image and text is significant for text-To-image synthesis as it bridges the semantic gap between image and text. Most existing text-To-image generation methods depend on the quality of the text embedding. If the text features are not be extracted well, it is difficult for subsequent processes to generate satisfactory images. However, these methods are disturbed by the text expression form in the process of extracting text features, resulting in the ideal text features cannot be generated well. In this paper, we propose a new text encoder that learns joint embedding to capture semantic information shared by the real images and the input text, and eliminates the interference of textual expression forms. The main difference with existing works is that for different texts describing the same image, although their expressions are different, because they contain the same semantic information, the proposed text encoder extracts the similar semantic features from these different texts. Meanwhile, a special auxiliary classifier for discriminator is adopted to retain low-level features to generate fine-detailed images. We evaluate this work on the Caltech-UCSD Birds 200 (CUB) and the Oxford-102 flower dataset, experiments show that our work has better performance than the state-of-The-Art works.
KW - joint embedding
KW - special auxiliary classifier
KW - text-To-image synthesis
UR - http://www.scopus.com/inward/record.url?scp=85098764448&partnerID=8YFLogxK
U2 - 10.1109/ICTAI50040.2020.00074
DO - 10.1109/ICTAI50040.2020.00074
M3 - Conference contribution
AN - SCOPUS:85098764448
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 432
EP - 436
BT - Proceedings - IEEE 32nd International Conference on Tools with Artificial Intelligence, ICTAI 2020
A2 - Alamaniotis, Miltos
A2 - Pan, Shimei
PB - IEEE Computer Society
T2 - 32nd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2020
Y2 - 9 November 2020 through 11 November 2020
ER -