Inferring social roles in long timespan video sequence

Jiangen Zhang; Wenze Hu; Benjamin Yao; Yongtian Wang; Song Chun Zhu

doi:10.1109/ICCVW.2011.6130422

Inferring social roles in long timespan video sequence

Jiangen Zhang^*, Wenze Hu, Benjamin Yao, Yongtian Wang, Song Chun Zhu

^*此作品的通讯作者

光电学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

6 引用（Scopus）

摘要

In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.

源语言	英语
主期刊名	2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
页	1456-1463
页数	8
DOI	https://doi.org/10.1109/ICCVW.2011.6130422
出版状态	已出版 - 2011
活动	2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 - Barcelona, 西班牙期限: 6 11月 2011 → 13 11月 2011

出版系列

姓名	Proceedings of the IEEE International Conference on Computer Vision

会议

会议	2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
国家/地区	西班牙
市	Barcelona
时期	6/11/11 → 13/11/11

访问文件

10.1109/ICCVW.2011.6130422

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, J., Hu, W., Yao, B., Wang, Y., & Zhu, S. C. (2011). Inferring social roles in long timespan video sequence. 在 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 (页码 1456-1463). 文章 6130422 (Proceedings of the IEEE International Conference on Computer Vision). https://doi.org/10.1109/ICCVW.2011.6130422

@inproceedings{d0d13d1c39044d9eaa48a6471fdb232b,

title = "Inferring social roles in long timespan video sequence",

abstract = "In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.",

author = "Jiangen Zhang and Wenze Hu and Benjamin Yao and Yongtian Wang and Zhu, {Song Chun}",

year = "2011",

doi = "10.1109/ICCVW.2011.6130422",

language = "English",

isbn = "9781467300629",

series = "Proceedings of the IEEE International Conference on Computer Vision",

pages = "1456--1463",

booktitle = "2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011",

note = "2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 ; Conference date: 06-11-2011 Through 13-11-2011",

}

Zhang, J, Hu, W, Yao, B, Wang, Y & Zhu, SC 2011, Inferring social roles in long timespan video sequence. 在 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011., 6130422, Proceedings of the IEEE International Conference on Computer Vision, 页码 1456-1463, 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011, Barcelona, 西班牙, 6/11/11. https://doi.org/10.1109/ICCVW.2011.6130422

TY - GEN

T1 - Inferring social roles in long timespan video sequence

AU - Zhang, Jiangen

AU - Hu, Wenze

AU - Yao, Benjamin

AU - Wang, Yongtian

AU - Zhu, Song Chun

PY - 2011

Y1 - 2011

N2 - In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.

AB - In this paper, we present a method for inferring social roles of agents (persons) from their daily activities in long surveillance video sequences. We define activities as interactions between an agent's position and semantic hotspots within the scene. Given a surveillance video, our method first tracks the locations of agents then automatically discovers semantic hotspots in the scene. By enumerating spatial/temporal locations between an agent's feet and hotspots in a scene, we define a set of atomic actions, which in turn compose sub-events and events. The numbers and types of events performed by an agent are assumed to be driven by his/her social role. With the grammar model induced by composition rules, an adapted Earley parser algorithm is used to parse the trajectories into events, sub-events and atomic actions. With probabilistic output of events, the roles of agents can be predicted under the Bayesian inference framework. Experiments are carried out on a challenging 8.5 hours video from a surveillance camera in the lobby of a research lab. The video contains 7 different social roles including manager, researcher, developer, engineer, staff, visitor and mailman. Results show that our proposed method can predict the role of each agent with high precision.

UR - http://www.scopus.com/inward/record.url?scp=84863074227&partnerID=8YFLogxK

U2 - 10.1109/ICCVW.2011.6130422

DO - 10.1109/ICCVW.2011.6130422

M3 - Conference contribution

AN - SCOPUS:84863074227

SN - 9781467300629

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 1456

EP - 1463

BT - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011

T2 - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011

Y2 - 6 November 2011 through 13 November 2011

ER -

Inferring social roles in long timespan video sequence

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此