Predicting image caption by a unified hierarchical model

Lin Bai, Kan Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

4 引用 (Scopus)

摘要

Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.

源语言英语
主期刊名2015 IEEE International Conference on Multimedia and Expo, ICME 2015
出版商IEEE Computer Society
ISBN(电子版)9781479970827
DOI
出版状态已出版 - 4 8月 2015
活动IEEE International Conference on Multimedia and Expo, ICME 2015 - Turin, 意大利
期限: 29 6月 20153 7月 2015

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
2015-August
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议IEEE International Conference on Multimedia and Expo, ICME 2015
国家/地区意大利
Turin
时期29/06/153/07/15

指纹

探究 'Predicting image caption by a unified hierarchical model' 的科研主题。它们共同构成独一无二的指纹。

引用此