TY - GEN
T1 - Generating image description by modeling spatial context of an image
AU - Li, Kan
AU - Bai, Lin
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/28
Y1 - 2015/9/28
N2 - Generating the descriptive sentences of a real image is a challenging task in image understanding. The difficulty mainly lies in recognizing the interaction activities between objects, and predicting the relationship between objects and stuff/scene. In this paper, we propose a framework for improving image description generation by addressing the above problems. Our framework mainly includes two models: a unified spatial context model and an image description generation model. The former, as the centerpiece of our framework, models 3D spatial context to learn the human-object interaction activities and predict the semantic relationship between these activities and stuff/scene. The spatial context model casts the problems as latent structured labeling problems, and can be resolved by a unified mathematical optimization. Then based on the semantic relationship, the image description generation model generates image descriptive sentences through the proposed lexicalized tree-based algorithm. Experiments on a joint dataset show that our framework outperforms state-of-the-art methods in spatial co-occurrence context analysis, the human-object interaction recognition, and the image description generation.
AB - Generating the descriptive sentences of a real image is a challenging task in image understanding. The difficulty mainly lies in recognizing the interaction activities between objects, and predicting the relationship between objects and stuff/scene. In this paper, we propose a framework for improving image description generation by addressing the above problems. Our framework mainly includes two models: a unified spatial context model and an image description generation model. The former, as the centerpiece of our framework, models 3D spatial context to learn the human-object interaction activities and predict the semantic relationship between these activities and stuff/scene. The spatial context model casts the problems as latent structured labeling problems, and can be resolved by a unified mathematical optimization. Then based on the semantic relationship, the image description generation model generates image descriptive sentences through the proposed lexicalized tree-based algorithm. Experiments on a joint dataset show that our framework outperforms state-of-the-art methods in spatial co-occurrence context analysis, the human-object interaction recognition, and the image description generation.
KW - Image recognition
KW - Layout
KW - Semantics
UR - http://www.scopus.com/inward/record.url?scp=84951155486&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2015.7280652
DO - 10.1109/IJCNN.2015.7280652
M3 - Conference contribution
AN - SCOPUS:84951155486
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2015 International Joint Conference on Neural Networks, IJCNN 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Joint Conference on Neural Networks, IJCNN 2015
Y2 - 12 July 2015 through 17 July 2015
ER -