A deep reinforced training method for location-based image captioning

Lei Zhao, Chunxia Zhang, Xi Zhang, Yating Hu*, Zhendong Niu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Neural encoder-decoder frameworks have been used extensively in image captioning. Recent research has shown that reinforcement learning can be utilized to train these frameworks directly on non-differentiable evaluation metrics. However, the captions generated by this method usually have limited grammaticality and readability. In this paper, we propose a novel model with the location-based mechanism which introduces the location information of each region in the image, and a combined training method that combines the cross entropy loss and reinforcement learning. We evaluate our model on four public benchmarks: Flickr8k, Flickr30k, MSCOCO and Image Chinese Captioning (ICC). Experimental results show that our model can improve the readability of the generated captions and outperforms the state-of-the-art methods across different evaluation metrics.

Original languageEnglish
Title of host publicationPRICAI 2018
Subtitle of host publicationTrends in Artificial Intelligence - 15th Pacific Rim International Conference on Artificial Intelligence, Proceedings
EditorsByeong-Ho Kang, Xin Geng
PublisherSpringer Verlag
Pages878-890
Number of pages13
ISBN (Print)9783319973036
DOIs
Publication statusPublished - 2018
Event15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018 - Nanjing, China
Duration: 28 Aug 201831 Aug 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11012 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2018
Country/TerritoryChina
CityNanjing
Period28/08/1831/08/18

Keywords

  • Combined training
  • Image captioning
  • Location-based mechanism

Fingerprint

Dive into the research topics of 'A deep reinforced training method for location-based image captioning'. Together they form a unique fingerprint.

Cite this