TY - GEN
T1 - Exploiting Knowledge Embedding to Improve the Description for Image Captioning
AU - Song, Dandan
AU - Peng, Cuimei
AU - Yang, Huan
AU - Liao, Lejian
N1 - Publisher Copyright:
© 2021, Springer Nature Singapore Pte Ltd.
PY - 2021
Y1 - 2021
N2 - Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.
AB - Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.
KW - Image captioning
KW - Knowledge embedding
KW - Knowledge representation
KW - Multi-head attention
UR - http://www.scopus.com/inward/record.url?scp=85111003797&partnerID=8YFLogxK
U2 - 10.1007/978-981-16-1964-9_25
DO - 10.1007/978-981-16-1964-9_25
M3 - Conference contribution
AN - SCOPUS:85111003797
SN - 9789811619632
T3 - Communications in Computer and Information Science
SP - 312
EP - 321
BT - Knowledge Graph and Semantic Computing
A2 - Chen, Huajun
A2 - Liu, Kang
A2 - Sun, Yizhou
A2 - Wang, Suge
A2 - Hou, Lei
PB - Springer Science and Business Media Deutschland GmbH
T2 - 5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020
Y2 - 12 November 2020 through 15 November 2020
ER -