Inferring social roles in long timespan video sequence

Jiangen Zhang; Wenze Hu; Benjamin Yao; Yongtian Wang; Song Chun Zhu

doi:10.1109/ICCVW.2011.6130422

Inferring social roles in long timespan video sequence

Jiangen Zhang^*, Wenze Hu, Benjamin Yao, Yongtian Wang, Song Chun Zhu

^*Corresponding author for this work

School of Optics and Photonics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

6 Citations (Scopus)

Abstract

In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.

Original language	English
Title of host publication	2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
Pages	1456-1463
Number of pages	8
DOIs	https://doi.org/10.1109/ICCVW.2011.6130422
Publication status	Published - 2011
Event	2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 - Barcelona, Spain Duration: 6 Nov 2011 → 13 Nov 2011

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision

Conference

Conference	2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
Country/Territory	Spain
City	Barcelona
Period	6/11/11 → 13/11/11

Access to Document

10.1109/ICCVW.2011.6130422

Cite this

Zhang, J., Hu, W., Yao, B., Wang, Y., & Zhu, S. C. (2011). Inferring social roles in long timespan video sequence. In 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 (pp. 1456-1463). Article 6130422 (Proceedings of the IEEE International Conference on Computer Vision). https://doi.org/10.1109/ICCVW.2011.6130422

@inproceedings{d0d13d1c39044d9eaa48a6471fdb232b,

title = "Inferring social roles in long timespan video sequence",

abstract = "In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.",

author = "Jiangen Zhang and Wenze Hu and Benjamin Yao and Yongtian Wang and Zhu, {Song Chun}",

year = "2011",

doi = "10.1109/ICCVW.2011.6130422",

language = "English",

isbn = "9781467300629",

series = "Proceedings of the IEEE International Conference on Computer Vision",

pages = "1456--1463",

booktitle = "2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011",

note = "2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 ; Conference date: 06-11-2011 Through 13-11-2011",

}

Zhang, J, Hu, W, Yao, B, Wang, Y & Zhu, SC 2011, Inferring social roles in long timespan video sequence. in 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011., 6130422, Proceedings of the IEEE International Conference on Computer Vision, pp. 1456-1463, 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011, Barcelona, Spain, 6/11/11. https://doi.org/10.1109/ICCVW.2011.6130422

Inferring social roles in long timespan video sequence. / Zhang, Jiangen; Hu, Wenze; Yao, Benjamin et al.
2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011. 2011. p. 1456-1463 6130422 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Inferring social roles in long timespan video sequence

AU - Zhang, Jiangen

AU - Hu, Wenze

AU - Yao, Benjamin

AU - Wang, Yongtian

AU - Zhu, Song Chun

PY - 2011

Y1 - 2011

N2 - In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.

AB - In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.

UR - http://www.scopus.com/inward/record.url?scp=84863074227&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2011.6130422

DO - 10.1109/ICCVW.2011.6130422

M3 - Conference contribution

AN - SCOPUS:84863074227

SN - 9781467300629

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 1456

EP - 1463

BT - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011

T2 - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011

Y2 - 6 November 2011 through 13 November 2011

ER -

Inferring social roles in long timespan video sequence

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this