Predicting image caption by a unified hierarchical model

Lin Bai; Kan Li

doi:10.1109/ICME.2015.7177427

Predicting image caption by a unified hierarchical model

Lin Bai, Kan Li

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.

Original language	English
Title of host publication	2015 IEEE International Conference on Multimedia and Expo, ICME 2015
Publisher	IEEE Computer Society
ISBN (Electronic)	9781479970827
DOIs	https://doi.org/10.1109/ICME.2015.7177427
Publication status	Published - 4 Aug 2015
Event	IEEE International Conference on Multimedia and Expo, ICME 2015 - Turin, Italy Duration: 29 Jun 2015 → 3 Jul 2015

Publication series

Name	Proceedings - IEEE International Conference on Multimedia and Expo
Volume	2015-August
ISSN (Print)	1945-7871
ISSN (Electronic)	1945-788X

Conference

Conference	IEEE International Conference on Multimedia and Expo, ICME 2015
Country/Territory	Italy
City	Turin
Period	29/06/15 → 3/07/15

Keywords

3D spatial context
Factored three-way interaction
Human-object interaction activity
Image caption
Unified hierarchical model

Access to Document

10.1109/ICME.2015.7177427

Cite this

Bai, L., & Li, K. (2015). Predicting image caption by a unified hierarchical model. In 2015 IEEE International Conference on Multimedia and Expo, ICME 2015 Article 7177427 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2015-August). IEEE Computer Society. https://doi.org/10.1109/ICME.2015.7177427

@inproceedings{3a5fb355cf6b48a8b56b842401961d44,

title = "Predicting image caption by a unified hierarchical model",

abstract = "Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.",

keywords = "3D spatial context, Factored three-way interaction, Human-object interaction activity, Image caption, Unified hierarchical model",

author = "Lin Bai and Kan Li",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; IEEE International Conference on Multimedia and Expo, ICME 2015 ; Conference date: 29-06-2015 Through 03-07-2015",

year = "2015",

month = aug,

day = "4",

doi = "10.1109/ICME.2015.7177427",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

booktitle = "2015 IEEE International Conference on Multimedia and Expo, ICME 2015",

address = "United States",

}

Bai, L & Li, K 2015, Predicting image caption by a unified hierarchical model. in 2015 IEEE International Conference on Multimedia and Expo, ICME 2015., 7177427, Proceedings - IEEE International Conference on Multimedia and Expo, vol. 2015-August, IEEE Computer Society, IEEE International Conference on Multimedia and Expo, ICME 2015, Turin, Italy, 29/06/15. https://doi.org/10.1109/ICME.2015.7177427

Predicting image caption by a unified hierarchical model. / Bai, Lin; Li, Kan.
2015 IEEE International Conference on Multimedia and Expo, ICME 2015. IEEE Computer Society, 2015. 7177427 (Proceedings - IEEE International Conference on Multimedia and Expo; Vol. 2015-August).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Predicting image caption by a unified hierarchical model

AU - Bai, Lin

AU - Li, Kan

PY - 2015/8/4

Y1 - 2015/8/4

N2 - Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.

AB - Automatically describing the content of an image is a challenging task in artificial intelligence. The difficulty is particularly pronounced in activity recognition and the image caption revealed by the relationship analysis of the activities involved in the image. This paper presents a unified hierarchical model to model the interaction activity between human and nearby object, and then speculates the image content by analyzing the logical relationship among the interaction activities. In our model, the first-layer factored three-way interaction machine models the 3D spatial context between human and the relevant object to straightly aid the prediction of human-object interaction activities. Then, the activities are further processed through the top-layer factored three-way interaction machine to learn the image content with the help of 3D spatial context among the activities. Experiments on joint dataset show that our unified hierarchical model outperforms state-of-the-arts in predicting human-object interaction activities and describing the image caption.

KW - 3D spatial context

KW - Factored three-way interaction

KW - Human-object interaction activity

KW - Image caption

KW - Unified hierarchical model

UR - http://www.scopus.com/inward/record.url?scp=84946029027&partnerID=8YFLogxK

U2 - 10.1109/ICME.2015.7177427

DO - 10.1109/ICME.2015.7177427

M3 - Conference contribution

AN - SCOPUS:84946029027

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

BT - 2015 IEEE International Conference on Multimedia and Expo, ICME 2015

PB - IEEE Computer Society

T2 - IEEE International Conference on Multimedia and Expo, ICME 2015

Y2 - 29 June 2015 through 3 July 2015

ER -

Predicting image caption by a unified hierarchical model

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this