Exploiting Knowledge Embedding to Improve the Description for Image Captioning

Dandan Song; Cuimei Peng; Huan Yang; Lejian Liao

doi:10.1007/978-981-16-1964-9_25

Exploiting Knowledge Embedding to Improve the Description for Image Captioning

Dandan Song^*, Cuimei Peng, Huan Yang, Lejian Liao

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.

Original language	English
Title of host publication	Knowledge Graph and Semantic Computing
Subtitle of host publication	Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers
Editors	Huajun Chen, Kang Liu, Yizhou Sun, Suge Wang, Lei Hou
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	312-321
Number of pages	10
ISBN (Print)	9789811619632
DOIs	https://doi.org/10.1007/978-981-16-1964-9_25
Publication status	Published - 2021
Event	5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020 - Nanchang, China Duration: 12 Nov 2020 → 15 Nov 2020

Publication series

Name	Communications in Computer and Information Science
Volume	1356 CCIS
ISSN (Print)	1865-0929
ISSN (Electronic)	1865-0937

Conference

Conference	5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020
Country/Territory	China
City	Nanchang
Period	12/11/20 → 15/11/20

Keywords

Image captioning
Knowledge embedding
Knowledge representation
Multi-head attention

Access to Document

10.1007/978-981-16-1964-9_25

Cite this

Song, D., Peng, C., Yang, H., & Liao, L. (2021). Exploiting Knowledge Embedding to Improve the Description for Image Captioning. In H. Chen, K. Liu, Y. Sun, S. Wang, & L. Hou (Eds.), Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers (pp. 312-321). (Communications in Computer and Information Science; Vol. 1356 CCIS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-1964-9_25

Song, Dandan ; Peng, Cuimei ; Yang, Huan et al. / Exploiting Knowledge Embedding to Improve the Description for Image Captioning. Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers. editor / Huajun Chen ; Kang Liu ; Yizhou Sun ; Suge Wang ; Lei Hou. Springer Science and Business Media Deutschland GmbH, 2021. pp. 312-321 (Communications in Computer and Information Science).

@inproceedings{7a0accf73249480dad414e4c7623d998,

title = "Exploiting Knowledge Embedding to Improve the Description for Image Captioning",

abstract = "Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.",

keywords = "Image captioning, Knowledge embedding, Knowledge representation, Multi-head attention",

author = "Dandan Song and Cuimei Peng and Huan Yang and Lejian Liao",

note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Singapore Pte Ltd.; 5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020 ; Conference date: 12-11-2020 Through 15-11-2020",

year = "2021",

doi = "10.1007/978-981-16-1964-9_25",

language = "English",

isbn = "9789811619632",

series = "Communications in Computer and Information Science",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "312--321",

editor = "Huajun Chen and Kang Liu and Yizhou Sun and Suge Wang and Lei Hou",

booktitle = "Knowledge Graph and Semantic Computing",

address = "Germany",

}

Song, D, Peng, C, Yang, H & Liao, L 2021, Exploiting Knowledge Embedding to Improve the Description for Image Captioning. in H Chen, K Liu, Y Sun, S Wang & L Hou (eds), Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers. Communications in Computer and Information Science, vol. 1356 CCIS, Springer Science and Business Media Deutschland GmbH, pp. 312-321, 5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020, Nanchang, China, 12/11/20. https://doi.org/10.1007/978-981-16-1964-9_25

Exploiting Knowledge Embedding to Improve the Description for Image Captioning. / Song, Dandan; Peng, Cuimei; Yang, Huan et al.
Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers. ed. / Huajun Chen; Kang Liu; Yizhou Sun; Suge Wang; Lei Hou. Springer Science and Business Media Deutschland GmbH, 2021. p. 312-321 (Communications in Computer and Information Science; Vol. 1356 CCIS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Exploiting Knowledge Embedding to Improve the Description for Image Captioning

AU - Song, Dandan

AU - Peng, Cuimei

AU - Yang, Huan

AU - Liao, Lejian

PY - 2021

Y1 - 2021

N2 - Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.

AB - Most existing methods for image captioning are based on the encoder-decoder framework which directly translates visual features into sentences, without exploiting commonsense knowledge available in the form of knowledge graph. Inspired by the success of information retrieval and question answering systems that leverage prior knowledge, we explore a knowledge embedding approach for image captioning. In this paper, we propose a Knowledge Embedding with Attention on Attention (KE-AoA) method for image captioning, which judges whether or how well the objects are related and augments semantic correlations and constraints between them. The KE-AoA method combines knowledge base method (TransE) and text method (Skip-gram), adding external knowledge graph information (triplets) into the language model to guide the learning of word vectors as the regularization term. Then it employs the AoA module to model the relations among different objects. As more inherent relations and commonsense knowledge are learned, the model can generate better image descriptions. The experiments on MSCOCO data sets achieve a significant improvement on the existing methods and validate the effectiveness of our prior knowledge-based approach.

KW - Image captioning

KW - Knowledge embedding

KW - Knowledge representation

KW - Multi-head attention

UR - http://www.scopus.com/inward/record.url?scp=85111003797&partnerID=8YFLogxK

U2 - 10.1007/978-981-16-1964-9_25

DO - 10.1007/978-981-16-1964-9_25

M3 - Conference contribution

AN - SCOPUS:85111003797

SN - 9789811619632

T3 - Communications in Computer and Information Science

SP - 312

EP - 321

BT - Knowledge Graph and Semantic Computing

A2 - Chen, Huajun

A2 - Liu, Kang

A2 - Sun, Yizhou

A2 - Wang, Suge

A2 - Hou, Lei

PB - Springer Science and Business Media Deutschland GmbH

T2 - 5th China Conference on Knowledge Graph, and Semantic Computing, CCKS 2020

Y2 - 12 November 2020 through 15 November 2020

ER -

Song D, Peng C, Yang H, Liao L. Exploiting Knowledge Embedding to Improve the Description for Image Captioning. In Chen H, Liu K, Sun Y, Wang S, Hou L, editors, Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence - 5th China Conference, CCKS 2020, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2021. p. 312-321. (Communications in Computer and Information Science). doi: 10.1007/978-981-16-1964-9_25

Exploiting Knowledge Embedding to Improve the Description for Image Captioning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this