A two-level attention-based interaction model for multi-person activity recognition

Lihua Lu; Huijun Di; Yao Lu; Lin Zhang; Shunzhou Wang

doi:10.1016/j.neucom.2018.09.060

A two-level attention-based interaction model for multi-person activity recognition

Lihua Lu, Huijun Di, Yao Lu^*, Lin Zhang, Shunzhou Wang

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

23 Citations (Scopus)

Abstract

Multi-person activity recognition is a challenging task due to its elusive interactions in activities. We take into account these interactions at two levels. At the individual level, each person behaves depending on both its spatio-temporal features and interactions propagated from others in the scene. At the scene level, the multi-person activity is characterized by interactions between individuals’ actions and the high-level activity. It is worth noting that interactions contribute unequally at both levels. To jointly explore these colorful interactions, we propose a two-level attention-based interaction model relying on two time-varying attention mechanisms. The individual-level attention mechanism conditioned on pose features, exploits various degrees of interactions among individuals in a scene while updating their states at each time step. The scene-level attention mechanism proposes an attention-based pooling strategy to explore various levels of interactions between individuals’ actions and the high-level activity. We ground our model by a modified two-stage Gated Recurrent Units (GRUs) network to handle the long-range temporal variability and consistency. Our end-to-end trainable model takes as inputs a set of person detections in videos or image sequences and predicts labels of multi-person activities. Experimental results demonstrate comparable performance of our model and show the effectiveness of our attention mechanisms.

Original language	English
Pages (from-to)	195-205
Number of pages	11
Journal	Neurocomputing
Volume	322
DOIs	https://doi.org/10.1016/j.neucom.2018.09.060
Publication status	Published - 17 Dec 2018

Keywords

Attention mechanism
Individual level
Multi-person activity recognition
Scene level

Access to Document

10.1016/j.neucom.2018.09.060

Cite this

Lu, L., Di, H., Lu, Y., Zhang, L., & Wang, S. (2018). A two-level attention-based interaction model for multi-person activity recognition. Neurocomputing, 322, 195-205. https://doi.org/10.1016/j.neucom.2018.09.060

@article{509e8ef860684856a7d62c2c5dcdd4a0,

title = "A two-level attention-based interaction model for multi-person activity recognition",

abstract = "Multi-person activity recognition is a challenging task due to its elusive interactions in activities. We take into account these interactions at two levels. At the individual level, each person behaves depending on both its spatio-temporal features and interactions propagated from others in the scene. At the scene level, the multi-person activity is characterized by interactions between individuals{\textquoteright} actions and the high-level activity. It is worth noting that interactions contribute unequally at both levels. To jointly explore these colorful interactions, we propose a two-level attention-based interaction model relying on two time-varying attention mechanisms. The individual-level attention mechanism conditioned on pose features, exploits various degrees of interactions among individuals in a scene while updating their states at each time step. The scene-level attention mechanism proposes an attention-based pooling strategy to explore various levels of interactions between individuals{\textquoteright} actions and the high-level activity. We ground our model by a modified two-stage Gated Recurrent Units (GRUs) network to handle the long-range temporal variability and consistency. Our end-to-end trainable model takes as inputs a set of person detections in videos or image sequences and predicts labels of multi-person activities. Experimental results demonstrate comparable performance of our model and show the effectiveness of our attention mechanisms.",

keywords = "Attention mechanism, Individual level, Multi-person activity recognition, Scene level",

author = "Lihua Lu and Huijun Di and Yao Lu and Lin Zhang and Shunzhou Wang",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier B.V.",

year = "2018",

month = dec,

day = "17",

doi = "10.1016/j.neucom.2018.09.060",

language = "English",

volume = "322",

pages = "195--205",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - A two-level attention-based interaction model for multi-person activity recognition

AU - Lu, Lihua

AU - Di, Huijun

AU - Lu, Yao

AU - Zhang, Lin

AU - Wang, Shunzhou

PY - 2018/12/17

Y1 - 2018/12/17

N2 - Multi-person activity recognition is a challenging task due to its elusive interactions in activities. We take into account these interactions at two levels. At the individual level, each person behaves depending on both its spatio-temporal features and interactions propagated from others in the scene. At the scene level, the multi-person activity is characterized by interactions between individuals’ actions and the high-level activity. It is worth noting that interactions contribute unequally at both levels. To jointly explore these colorful interactions, we propose a two-level attention-based interaction model relying on two time-varying attention mechanisms. The individual-level attention mechanism conditioned on pose features, exploits various degrees of interactions among individuals in a scene while updating their states at each time step. The scene-level attention mechanism proposes an attention-based pooling strategy to explore various levels of interactions between individuals’ actions and the high-level activity. We ground our model by a modified two-stage Gated Recurrent Units (GRUs) network to handle the long-range temporal variability and consistency. Our end-to-end trainable model takes as inputs a set of person detections in videos or image sequences and predicts labels of multi-person activities. Experimental results demonstrate comparable performance of our model and show the effectiveness of our attention mechanisms.

AB - Multi-person activity recognition is a challenging task due to its elusive interactions in activities. We take into account these interactions at two levels. At the individual level, each person behaves depending on both its spatio-temporal features and interactions propagated from others in the scene. At the scene level, the multi-person activity is characterized by interactions between individuals’ actions and the high-level activity. It is worth noting that interactions contribute unequally at both levels. To jointly explore these colorful interactions, we propose a two-level attention-based interaction model relying on two time-varying attention mechanisms. The individual-level attention mechanism conditioned on pose features, exploits various degrees of interactions among individuals in a scene while updating their states at each time step. The scene-level attention mechanism proposes an attention-based pooling strategy to explore various levels of interactions between individuals’ actions and the high-level activity. We ground our model by a modified two-stage Gated Recurrent Units (GRUs) network to handle the long-range temporal variability and consistency. Our end-to-end trainable model takes as inputs a set of person detections in videos or image sequences and predicts labels of multi-person activities. Experimental results demonstrate comparable performance of our model and show the effectiveness of our attention mechanisms.

KW - Attention mechanism

KW - Individual level

KW - Multi-person activity recognition

KW - Scene level

UR - http://www.scopus.com/inward/record.url?scp=85054601020&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.09.060

DO - 10.1016/j.neucom.2018.09.060

M3 - Article

AN - SCOPUS:85054601020

SN - 0925-2312

VL - 322

SP - 195

EP - 205

JO - Neurocomputing

JF - Neurocomputing

ER -

A two-level attention-based interaction model for multi-person activity recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this