Exploiting Knowledge Embedding to Improve the Description for Image Captioning

Dandan Song*, Cuimei Peng, Huan Yang, Lejian Liao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.

Original languageEnglish
Title of host publicationKnowledge Graph and Semantic Computing
Subtitle of host publicationKnowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers
EditorsHuajun Chen, Kang Liu, Yizhou Sun, Suge Wang, Lei Hou
PublisherSpringer Science and Business Media Deutschland GmbH
Pages312-321
Number of pages10
ISBN (Print)9789811619632
DOIs
Publication statusPublished - 2021
Event5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020 - Nanchang, China
Duration: 12 Nov 202015 Nov 2020

Publication series

NameCommunications in Computer and Information Science
Volume1356 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020
Country/TerritoryChina
CityNanchang
Period12/11/2015/11/20

Keywords

  • Image captioning
  • Knowledge embedding
  • Knowledge representation
  • Multi-head attention

Fingerprint

Dive into the research topics of 'Exploiting Knowledge Embedding to Improve the Description for Image Captioning'. Together they form a unique fingerprint.

Cite this

Song, D., Peng, C., Yang, H., & Liao, L. (2021). Exploiting Knowledge Embedding to Improve the Description for Image Captioning. In H. Chen, K. Liu, Y. Sun, S. Wang, & L. Hou (Eds.), Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers (pp. 312-321). (Communications in Computer and Information Science; Vol. 1356 CCIS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-1964-9_25