Spatio-temporal attention mechanisms based model for collective activity recognition

Lihua Lu; Huijun Di; Yao Lu; Lin Zhang; Shunzhou Wang

doi:10.1016/j.image.2019.02.012

Spatio-temporal attention mechanisms based model for collective activity recognition

Lihua Lu, Huijun Di, Yao Lu^*, Lin Zhang, Shunzhou Wang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

17 引用（Scopus）

摘要

Collective activity recognition involving multiple people active and interactive in a collective scenario is a widely-used but challenging domain in computer vision. The key to this end task is how to efficiently explore the spatial and temporal evolutions of the collective activities. In this paper we propose a spatio-temporal attention mechanisms based model to exploit spatial configurations and temporal dynamics in collective scenes. We present ingenious spatio-temporal attention mechanisms built from both deep RGB features and human articulated poses to capture spatio-temporal evolutions of individuals’ actions and the collective activity. Benefited from these attention mechanisms, our model learns to spatially capture unbalanced person–group interactions for each person while updating each individual state based on these interactions, and temporally assess reliabilities of different video frames to predict the final label of the collective activity. Furthermore, the long-range temporal variability and consistency are handled by a two-stage Gated Recurrent Units (GRUs) network. Finally, to ensure effective training of our model, we jointly optimize the losses at both person and group levels to drive the model learning process. Experimental results indicate that our method outperforms the state-of-the-art on Volleyball dataset. More check experiments and visual results demonstrate the effectiveness and practicability of the proposed model.

源语言	英语
页（从-至）	162-174
页数	13
期刊	Signal Processing: Image Communication
卷	74
DOI	https://doi.org/10.1016/j.image.2019.02.012
出版状态	已出版 - 5月 2019

访问文件

10.1016/j.image.2019.02.012

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{99f86290982d4ae1bbe84230d52a30d7,

title = "Spatio-temporal attention mechanisms based model for collective activity recognition",

abstract = "Collective activity recognition involving multiple people active and interactive in a collective scenario is a widely-used but challenging domain in computer vision. The key to this end task is how to efficiently explore the spatial and temporal evolutions of the collective activities. In this paper we propose a spatio-temporal attention mechanisms based model to exploit spatial configurations and temporal dynamics in collective scenes. We present ingenious spatio-temporal attention mechanisms built from both deep RGB features and human articulated poses to capture spatio-temporal evolutions of individuals{\textquoteright} actions and the collective activity. Benefited from these attention mechanisms, our model learns to spatially capture unbalanced person–group interactions for each person while updating each individual state based on these interactions, and temporally assess reliabilities of different video frames to predict the final label of the collective activity. Furthermore, the long-range temporal variability and consistency are handled by a two-stage Gated Recurrent Units (GRUs) network. Finally, to ensure effective training of our model, we jointly optimize the losses at both person and group levels to drive the model learning process. Experimental results indicate that our method outperforms the state-of-the-art on Volleyball dataset. More check experiments and visual results demonstrate the effectiveness and practicability of the proposed model.",

keywords = "Attention mechanisms, Gated Recurrent Units (GRUs) network, Multi-modal data, Multi-person activity recognition, Spatio-temporal model",

author = "Lihua Lu and Huijun Di and Yao Lu and Lin Zhang and Shunzhou Wang",

note = "Publisher Copyright: {\textcopyright} 2019",

year = "2019",

month = may,

doi = "10.1016/j.image.2019.02.012",

language = "English",

volume = "74",

pages = "162--174",

journal = "Signal Processing: Image Communication",

issn = "0923-5965",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Spatio-temporal attention mechanisms based model for collective activity recognition

AU - Lu, Lihua

AU - Di, Huijun

AU - Lu, Yao

AU - Zhang, Lin

AU - Wang, Shunzhou

PY - 2019/5

Y1 - 2019/5

N2 - Collective activity recognition involving multiple people active and interactive in a collective scenario is a widely-used but challenging domain in computer vision. The key to this end task is how to efficiently explore the spatial and temporal evolutions of the collective activities. In this paper we propose a spatio-temporal attention mechanisms based model to exploit spatial configurations and temporal dynamics in collective scenes. We present ingenious spatio-temporal attention mechanisms built from both deep RGB features and human articulated poses to capture spatio-temporal evolutions of individuals’ actions and the collective activity. Benefited from these attention mechanisms, our model learns to spatially capture unbalanced person–group interactions for each person while updating each individual state based on these interactions, and temporally assess reliabilities of different video frames to predict the final label of the collective activity. Furthermore, the long-range temporal variability and consistency are handled by a two-stage Gated Recurrent Units (GRUs) network. Finally, to ensure effective training of our model, we jointly optimize the losses at both person and group levels to drive the model learning process. Experimental results indicate that our method outperforms the state-of-the-art on Volleyball dataset. More check experiments and visual results demonstrate the effectiveness and practicability of the proposed model.

AB - Collective activity recognition involving multiple people active and interactive in a collective scenario is a widely-used but challenging domain in computer vision. The key to this end task is how to efficiently explore the spatial and temporal evolutions of the collective activities. In this paper we propose a spatio-temporal attention mechanisms based model to exploit spatial configurations and temporal dynamics in collective scenes. We present ingenious spatio-temporal attention mechanisms built from both deep RGB features and human articulated poses to capture spatio-temporal evolutions of individuals’ actions and the collective activity. Benefited from these attention mechanisms, our model learns to spatially capture unbalanced person–group interactions for each person while updating each individual state based on these interactions, and temporally assess reliabilities of different video frames to predict the final label of the collective activity. Furthermore, the long-range temporal variability and consistency are handled by a two-stage Gated Recurrent Units (GRUs) network. Finally, to ensure effective training of our model, we jointly optimize the losses at both person and group levels to drive the model learning process. Experimental results indicate that our method outperforms the state-of-the-art on Volleyball dataset. More check experiments and visual results demonstrate the effectiveness and practicability of the proposed model.

KW - Attention mechanisms

KW - Gated Recurrent Units (GRUs) network

KW - Multi-modal data

KW - Multi-person activity recognition

KW - Spatio-temporal model

UR - http://www.scopus.com/inward/record.url?scp=85062449442&partnerID=8YFLogxK

U2 - 10.1016/j.image.2019.02.012

DO - 10.1016/j.image.2019.02.012

M3 - Article

AN - SCOPUS:85062449442

SN - 0923-5965

VL - 74

SP - 162

EP - 174

JO - Signal Processing: Image Communication

JF - Signal Processing: Image Communication

ER -

Spatio-temporal attention mechanisms based model for collective activity recognition

摘要

访问文件

其它文件与链接

指纹

引用此