Generating image description by modeling spatial context of an image

Kan Li; Lin Bai

doi:10.1109/IJCNN.2015.7280652

Generating image description by modeling spatial context of an image

Kan Li, Lin Bai

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

4 Citations (Scopus)

Abstract

Generating the descriptive sentences of a real image is a challenging task in image understanding. The difficulty mainly lies in recognizing the interaction activities between objects, and predicting the relationship between objects and stuff/scene. In this paper, we propose a framework for improving image description generation by addressing the above problems. Our framework mainly includes two models: a unified spatial context model and an image description generation model. The former, as the centerpiece of our framework, models 3D spatial context to learn the human-object interaction activities and predict the semantic relationship between these activities and stuff/scene. The spatial context model casts the problems as latent structured labeling problems, and can be resolved by a unified mathematical optimization. Then based on the semantic relationship, the image description generation model generates image descriptive sentences through the proposed lexicalized tree-based algorithm. Experiments on a joint dataset show that our framework outperforms state-of-the-art methods in spatial co-occurrence context analysis, the human-object interaction recognition, and the image description generation.

Original language	English
Title of host publication	2015 International Joint Conference on Neural Networks, IJCNN 2015
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781479919604, 9781479919604, 9781479919604, 9781479919604
DOIs	https://doi.org/10.1109/IJCNN.2015.7280652
Publication status	Published - 28 Sept 2015
Event	International Joint Conference on Neural Networks, IJCNN 2015 - Killarney, Ireland Duration: 12 Jul 2015 → 17 Jul 2015

Publication series

Name	Proceedings of the International Joint Conference on Neural Networks
Volume	2015-September

Conference

Conference	International Joint Conference on Neural Networks, IJCNN 2015
Country/Territory	Ireland
City	Killarney
Period	12/07/15 → 17/07/15

Keywords

Image recognition
Layout
Semantics

Access to Document

10.1109/IJCNN.2015.7280652

Cite this

Li, K., & Bai, L. (2015). Generating image description by modeling spatial context of an image. In 2015 International Joint Conference on Neural Networks, IJCNN 2015 Article 7280652 (Proceedings of the International Joint Conference on Neural Networks; Vol. 2015-September). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2015.7280652

@inproceedings{26989f201b914f82819ebc0bb4d06c7d,

title = "Generating image description by modeling spatial context of an image",

abstract = "Generating the descriptive sentences of a real image is a challenging task in image understanding. The difficulty mainly lies in recognizing the interaction activities between objects, and predicting the relationship between objects and stuff/scene. In this paper, we propose a framework for improving image description generation by addressing the above problems. Our framework mainly includes two models: a unified spatial context model and an image description generation model. The former, as the centerpiece of our framework, models 3D spatial context to learn the human-object interaction activities and predict the semantic relationship between these activities and stuff/scene. The spatial context model casts the problems as latent structured labeling problems, and can be resolved by a unified mathematical optimization. Then based on the semantic relationship, the image description generation model generates image descriptive sentences through the proposed lexicalized tree-based algorithm. Experiments on a joint dataset show that our framework outperforms state-of-the-art methods in spatial co-occurrence context analysis, the human-object interaction recognition, and the image description generation.",

keywords = "Image recognition, Layout, Semantics",

author = "Kan Li and Lin Bai",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; International Joint Conference on Neural Networks, IJCNN 2015 ; Conference date: 12-07-2015 Through 17-07-2015",

year = "2015",

month = sep,

day = "28",

doi = "10.1109/IJCNN.2015.7280652",

language = "English",

series = "Proceedings of the International Joint Conference on Neural Networks",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2015 International Joint Conference on Neural Networks, IJCNN 2015",

address = "United States",

}

Li, K & Bai, L 2015, Generating image description by modeling spatial context of an image. in 2015 International Joint Conference on Neural Networks, IJCNN 2015., 7280652, Proceedings of the International Joint Conference on Neural Networks, vol. 2015-September, Institute of Electrical and Electronics Engineers Inc., International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, 12/07/15. https://doi.org/10.1109/IJCNN.2015.7280652

Generating image description by modeling spatial context of an image. / Li, Kan; Bai, Lin.
2015 International Joint Conference on Neural Networks, IJCNN 2015. Institute of Electrical and Electronics Engineers Inc., 2015. 7280652 (Proceedings of the International Joint Conference on Neural Networks; Vol. 2015-September).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Generating image description by modeling spatial context of an image

AU - Li, Kan

AU - Bai, Lin

PY - 2015/9/28

Y1 - 2015/9/28

N2 - Generating the descriptive sentences of a real image is a challenging task in image understanding. The difficulty mainly lies in recognizing the interaction activities between objects, and predicting the relationship between objects and stuff/scene. In this paper, we propose a framework for improving image description generation by addressing the above problems. Our framework mainly includes two models: a unified spatial context model and an image description generation model. The former, as the centerpiece of our framework, models 3D spatial context to learn the human-object interaction activities and predict the semantic relationship between these activities and stuff/scene. The spatial context model casts the problems as latent structured labeling problems, and can be resolved by a unified mathematical optimization. Then based on the semantic relationship, the image description generation model generates image descriptive sentences through the proposed lexicalized tree-based algorithm. Experiments on a joint dataset show that our framework outperforms state-of-the-art methods in spatial co-occurrence context analysis, the human-object interaction recognition, and the image description generation.

AB - Generating the descriptive sentences of a real image is a challenging task in image understanding. The difficulty mainly lies in recognizing the interaction activities between objects, and predicting the relationship between objects and stuff/scene. In this paper, we propose a framework for improving image description generation by addressing the above problems. Our framework mainly includes two models: a unified spatial context model and an image description generation model. The former, as the centerpiece of our framework, models 3D spatial context to learn the human-object interaction activities and predict the semantic relationship between these activities and stuff/scene. The spatial context model casts the problems as latent structured labeling problems, and can be resolved by a unified mathematical optimization. Then based on the semantic relationship, the image description generation model generates image descriptive sentences through the proposed lexicalized tree-based algorithm. Experiments on a joint dataset show that our framework outperforms state-of-the-art methods in spatial co-occurrence context analysis, the human-object interaction recognition, and the image description generation.

KW - Image recognition

KW - Layout

KW - Semantics

UR - http://www.scopus.com/inward/record.url?scp=84951155486&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2015.7280652

DO - 10.1109/IJCNN.2015.7280652

M3 - Conference contribution

AN - SCOPUS:84951155486

T3 - Proceedings of the International Joint Conference on Neural Networks

BT - 2015 International Joint Conference on Neural Networks, IJCNN 2015

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - International Joint Conference on Neural Networks, IJCNN 2015

Y2 - 12 July 2015 through 17 July 2015

ER -

Generating image description by modeling spatial context of an image

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this