跳到主要导航 跳到搜索 跳到主要内容

Joint commonsense and relation reasoning for image and video captioning

  • Jingyi Hou
  • , Xinxiao Wu*
  • , Xiaoxun Zhang
  • , Yayun Qi
  • , Yunde Jia
  • , Jiebo Luo
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Alibaba Group
  • University of Rochester

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Exploiting relationships between objects for image and video captioning has received increasing attention. Most existing methods depend heavily on pre-trained detectors of objects and their relationships, and thus may not work well when facing detection challenges such as heavy occlusion, tiny-size objects, and long-tail classes. In this paper, we propose a joint commonsense and relation reasoning method that exploits prior knowledge for image and video captioning without relying on any detectors. The prior knowledge provides semantic correlations and constraints between objects, serving as guidance to build semantic graphs that summarize object relationships, some of which cannot be directly perceived from images or videos. Particularly, our method is implemented by an iterative learning algorithm that alternates between 1) commonsense reasoning for embedding visual regions into the semantic space to build a semantic graph and 2) relation reasoning for encoding semantic graphs to generate sentences. Experiments on several benchmark datasets validate the effectiveness of our prior knowledge-based approach.

源语言英语
主期刊名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
出版商AAAI press
10973-10980
页数8
ISBN(电子版)9781577358350
出版状态已出版 - 2020
活动34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, 美国
期限: 7 2月 202012 2月 2020

出版系列

姓名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

会议

会议34th AAAI Conference on Artificial Intelligence, AAAI 2020
国家/地区美国
New York
时期7/02/2012/02/20

指纹

探究 'Joint commonsense and relation reasoning for image and video captioning' 的科研主题。它们共同构成独一无二的指纹。

引用此