Spatio-temporal attention mechanisms based model for collective activity recognition

Lihua Lu, Huijun Di, Yao Lu*, Lin Zhang, Shunzhou Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

17 引用 (Scopus)

摘要

Collective activity recognition involving multiple people active and interactive in a collective scenario is a widely-used but challenging domain in computer vision. The key to this end task is how to efficiently explore the spatial and temporal evolutions of the collective activities. In this paper we propose a spatio-temporal attention mechanisms based model to exploit spatial configurations and temporal dynamics in collective scenes. We present ingenious spatio-temporal attention mechanisms built from both deep RGB features and human articulated poses to capture spatio-temporal evolutions of individuals’ actions and the collective activity. Benefited from these attention mechanisms, our model learns to spatially capture unbalanced person–group interactions for each person while updating each individual state based on these interactions, and temporally assess reliabilities of different video frames to predict the final label of the collective activity. Furthermore, the long-range temporal variability and consistency are handled by a two-stage Gated Recurrent Units (GRUs) network. Finally, to ensure effective training of our model, we jointly optimize the losses at both person and group levels to drive the model learning process. Experimental results indicate that our method outperforms the state-of-the-art on Volleyball dataset. More check experiments and visual results demonstrate the effectiveness and practicability of the proposed model.

源语言英语
页(从-至)162-174
页数13
期刊Signal Processing: Image Communication
74
DOI
出版状态已出版 - 5月 2019

指纹

探究 'Spatio-temporal attention mechanisms based model for collective activity recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此