Past is important: Improved image captioning by looking back in time

Yiwei Wei, Chunlei Wu, Zhi Yang Jia, Xu Fei Hu, Shuang Guo, Haitao Shi*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

A major development in the area of image captioning consists of trying to incorporate visual attention in the design of language generative model. However, most previous studies only emphasize its role in enhancing visual composition at the current moment, while neglect its role in global sequence reasoning. This problem appears not only in captioning model, but also in reinforcement learning structure. To tackle this issue, we first propose a Visual Reserved model that enables previous visual context to be considered for the current sequence reasoning. Next, a Attentional-Fluctuation Supervised model is also proposed in reinforcement learning structure. Compared against the traditional strategies that only take non-differentiable Natural Language Processing (NLP) metrics as the incentive standard, the proposed model regards the fluctuation of previous attention matrix as an important indicator to judge the convergence of the captioning model. The proposed methods have been tested on MS-COCO captioning dataset and achieve competitive results evaluated by the evaluation server of MS COCO captioning challenge.

Original languageEnglish
Article number116183
JournalSignal Processing: Image Communication
Volume94
DOIs
Publication statusPublished - May 2021
Externally publishedYes

Keywords

  • Image captioning
  • Reinforcement learning
  • Visual attention

Fingerprint

Dive into the research topics of 'Past is important: Improved image captioning by looking back in time'. Together they form a unique fingerprint.

Cite this