TY - GEN
T1 - Inferring social roles in long timespan video sequence
AU - Zhang, Jiangen
AU - Hu, Wenze
AU - Yao, Benjamin
AU - Wang, Yongtian
AU - Zhu, Song Chun
PY - 2011
Y1 - 2011
N2 - In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.
AB - In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.
UR - http://www.scopus.com/inward/record.url?scp=84863074227&partnerID=8YFLogxK
U2 - 10.1109/ICCVW.2011.6130422
DO - 10.1109/ICCVW.2011.6130422
M3 - Conference contribution
AN - SCOPUS:84863074227
SN - 9781467300629
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 1456
EP - 1463
BT - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
T2 - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
Y2 - 6 November 2011 through 13 November 2011
ER -