REVnet: Bring reviewing into video captioning for a better description

Huidong Li, Dandan Song, Lejian Liao, Cuimei Peng

科研成果: 书/报告/会议事项章节会议稿件同行评审

7 引用 (Scopus)

摘要

Recently, the task of automatically generating a textual description of a video is attracting increasing interest. The attention-based encoder-decoder framework has been extensively applied in this domain. However, compared with other captioning tasks, such as image captioning, video captioning is more challenging because semantic information among frames is hard to be extracted. In this paper, we propose a reviewing network (REVnet) to reconstruct the previous hidden state, which is combined with the conventional encoder-decoder framework. REVnet brings backward flow into the caption generation process, which encourages the hidden state embedding more information and enables the semantics of the generated sentence more coherent. Furthermore, REVnet can regularize the attention mechanism within the framework, which encourages the model better utilizing the semantic information extracted from multiple different frames. Our experimental results on benchmark datasets demonstrate that our proposed REVnet has a significant improvement over the baseline method. Furthermore, we use a reinforcement learning method to finetune the model, and get better results than the state-of-the-art methods.

源语言英语
主期刊名Proceedings - 2019 IEEE International Conference on Multimedia and Expo, ICME 2019
出版商IEEE Computer Society
1312-1317
页数6
ISBN(电子版)9781538695524
DOI
出版状态已出版 - 7月 2019
活动2019 IEEE International Conference on Multimedia and Expo, ICME 2019 - Shanghai, 中国
期限: 8 7月 201912 7月 2019

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
2019-July
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议2019 IEEE International Conference on Multimedia and Expo, ICME 2019
国家/地区中国
Shanghai
时期8/07/1912/07/19

指纹

探究 'REVnet: Bring reviewing into video captioning for a better description' 的科研主题。它们共同构成独一无二的指纹。

引用此