TY - GEN
T1 - Recognizing visual composite in real images
AU - Bai, Lin
AU - Li, Kan
AU - Jiang, Shuai
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/9/28
Y1 - 2015/9/28
N2 - Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.
AB - Automatically discovering and recognizing the main structured visual pattern of an image is a challenging problem. The most difficulties are how to find the component objects and how to recognize the interaction among these objects. The component objects of the structured visual pattern have consistent 3D spatial co-occurrence layout across images, which manifest themselves as a predictable pattern called visual composite. In this paper, we propose a visual composite recognition model to automatically discover and recognize the visual composite of an image. Our model firstly learns 3D spatial co-occurrence statistics among objects to discover the potential structured visual pattern of an image so that it captures the component objects of visual composite. Secondly, we construct a feedforward architecture using the proposed factored three-way interaction machine to recognize the visual composite, which casts the recognition problem as a structured prediction task. It predicts the visual composite by maximizing the probability of the correct structured label given the component objects and their 3D spatial context. Experiments conducted on a six-class sports dataset and a phrasal recognition dataset respectively demonstrate the encouraging performance of our model in discovery precision and recognition accuracy compared with competing approaches.
KW - Computational modeling
KW - Image recognition
KW - Three-dimensional displays
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=84951028520&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2015.7280523
DO - 10.1109/IJCNN.2015.7280523
M3 - Conference contribution
AN - SCOPUS:84951028520
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2015 International Joint Conference on Neural Networks, IJCNN 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Joint Conference on Neural Networks, IJCNN 2015
Y2 - 12 July 2015 through 17 July 2015
ER -