Transformer with Prior Language Knowledge for Image Captioning

Daisong Yan, Wenxin Yu*, Zhiqiang Zhang, Jun Gong

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

The Transformer architecture represents state-of-the-art in image captioning tasks. However, even the transformer uses positional encodings to encode sentences, its performance still not good enough in grammar. To improve the performance of image captioning, we present Prior Language Knowledge Transformer (PLKT)—a transformer-based model that can integrate learned a priori language knowledge for image captioning. In our proposal, when our model predicts the next word, it not only depends on the previously generated sequence but also relies on prior language knowledge. To obtain prior language knowledge, we embed a learnable memory vector inside the self-attention. Meanwhile, we use reinforcement learning to fine-tune the model in training. To prove the advancement and promising effectiveness of PLKT, we compare our approach with other recent image captioning methods in the experiments. Through objective results, our proposal increased the CIDEr score of the baseline by 0.6 points on the “Karpathy” test split when tested on COCO2014 dataset. In subjective results, our approach generated sentences is obviously better than baseline in grammar.

源语言英语
主期刊名Neural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
编辑Teddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, Achmad Nizar Hidayanto
出版商Springer Science and Business Media Deutschland GmbH
40-51
页数12
ISBN(印刷版)9783030922696
DOI
出版状态已出版 - 2021
活动28th International Conference on Neural Information Processing, ICONIP 2021 - Virtual, Online
期限: 8 12月 202112 12月 2021

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13109 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议28th International Conference on Neural Information Processing, ICONIP 2021
Virtual, Online
时期8/12/2112/12/21

指纹

探究 'Transformer with Prior Language Knowledge for Image Captioning' 的科研主题。它们共同构成独一无二的指纹。

引用此