Predicting image caption by a unified hierarchical model

Lin Bai, Kan Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.

Original languageEnglish
Title of host publication2015 IEEE International Conference on Multimedia and Expo, ICME 2015
PublisherIEEE Computer Society
ISBN (Electronic)9781479970827
DOIs
Publication statusPublished - 4 Aug 2015
EventIEEE International Conference on Multimedia and Expo, ICME 2015 - Turin, Italy
Duration: 29 Jun 20153 Jul 2015

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2015-August
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

ConferenceIEEE International Conference on Multimedia and Expo, ICME 2015
Country/TerritoryItaly
CityTurin
Period29/06/153/07/15

Keywords

  • 3D spatial context
  • Factored three-way interaction
  • Human-object interaction activity
  • Image caption
  • Unified hierarchical model

Fingerprint

Dive into the research topics of 'Predicting image caption by a unified hierarchical model'. Together they form a unique fingerprint.

Cite this