What is where: Inferring containment relations from videos

Liang Wei, Yibiao Zhao, Yixin Zhu, Song Chun Zhu

Research output: Contribution to journalConference articlepeer-review

21 Citations (Scopus)

Abstract

In this paper, we present a probabilistic approach to explicitly infer containment relations between objects in 3D scenes. Given an input RGB-D video, our algorithm quantizes the perceptual space of a 3D scene by reasoning about containment relations over time. At each frame, we represent the containment relations in space by a containment graph, where each vertex represents an object and each edge represents a containment relation. We assume that human actions are the only cause that leads to containment relation changes over time, and classify human actions into four types of events: movein, move-out, no-change and paranormal-change. Here, paranomal-change refers to the events that are physically infeasible, and thus are ruled out through reasoning. A dynamic programming algorithm is adopted to finding both the optimal sequence of containment relations across the video, and the containment relation changes between adjacent frames. We evaluate the proposed method on our dataset with 1326 video clips taken in 9 indoor scenes, including some challenging cases, such as heavy occlusions and diverse changes of containment relations. The experimental results demonstrate good performance on the dataset.

Original languageEnglish
Pages (from-to)3418-3424
Number of pages7
JournalIJCAI International Joint Conference on Artificial Intelligence
Volume2016-January
Publication statusPublished - 2016
Event25th International Joint Conference on Artificial Intelligence, IJCAI 2016 - New York, United States
Duration: 9 Jul 201615 Jul 2016

Fingerprint

Dive into the research topics of 'What is where: Inferring containment relations from videos'. Together they form a unique fingerprint.

Cite this