Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Shihao Xu; Haocong Rao; Hong Peng; Xin Jiang; Yi Guo; Xiping Hu; Bin Hu

doi:10.1109/JIOT.2020.3042986

Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Shihao Xu, Haocong Rao, Hong Peng, Xin Jiang^*, Yi Guo^*, Xiping Hu, Bin Hu^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

26 Citations (Scopus)

Abstract

Action recognition is essential for many human-centered applications in the Internet of Things (IoT). Especially, in the Internet of Medical Things (IoMT), action recognition shows great importance in surgical assistance, patient monitoring, etc. Recently, 3-D skeleton sequence-based action recognition draws broad attention. It is a challenging task that needs effective modeling on intraframe skeleton representations and interframe temporal dynamics. Standard long short-term memory (LSTM)-based models are widely used for sequence modeling due to its long-term memory, yet they are unable to fully model the relationship between different body joints or persons to extract crucial co-occurrence features from different levels. To handle this shortcoming, we propose an attention-based multilevel co-occurrence graph convolutional LSTM (AMCGC-LSTM). By integrating graph convolutional networks (GCNs) into LSTM, the proposed model is capable of leveraging body structural information from skeletons and strengthening the multilevel co-occurrence (MC) feature learning. Specifically, we first design the spatial attention module for feature enhancement of key joints from skeleton inputs. Second, we design MC memory units coupled with GCN to automatically model the spatial relationship between joints, and simultaneously capture the co-occurrence features from different joints, persons, and frames. Finally, we construct aggregated features of MCs (AFMCs) from MC memory units to better represent the intraframe action context encoding, and leverage a concurrent LSTM (Co-LSTM) to further model their temporal dynamics for action recognition. Our model significantly outperforms mainstream methods on NTU RGB+D 60/120 data set, mutual action subset of NTU RGB+D 60/120 data set, and Northewestern-UCLA data set.

Original language	English
Pages (from-to)	15990-16001
Number of pages	12
Journal	IEEE Internet of Things Journal
Volume	8
Issue number	21
DOIs	https://doi.org/10.1109/JIOT.2020.3042986
Publication status	Published - 1 Nov 2021
Externally published	Yes

Keywords

3-D action recognition
Internet of Medical Things (IoMT)
graph convolution
multilevel co-occurrence (MC)
spatial attention (SA)

Access to Document

10.1109/JIOT.2020.3042986

Cite this

@article{f6f5231cbbb242bf9d994fd8f43903a9,

title = "Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition",

abstract = "Action recognition is essential for many human-centered applications in the Internet of Things (IoT). Especially, in the Internet of Medical Things (IoMT), action recognition shows great importance in surgical assistance, patient monitoring, etc. Recently, 3-D skeleton sequence-based action recognition draws broad attention. It is a challenging task that needs effective modeling on intraframe skeleton representations and interframe temporal dynamics. Standard long short-term memory (LSTM)-based models are widely used for sequence modeling due to its long-term memory, yet they are unable to fully model the relationship between different body joints or persons to extract crucial co-occurrence features from different levels. To handle this shortcoming, we propose an attention-based multilevel co-occurrence graph convolutional LSTM (AMCGC-LSTM). By integrating graph convolutional networks (GCNs) into LSTM, the proposed model is capable of leveraging body structural information from skeletons and strengthening the multilevel co-occurrence (MC) feature learning. Specifically, we first design the spatial attention module for feature enhancement of key joints from skeleton inputs. Second, we design MC memory units coupled with GCN to automatically model the spatial relationship between joints, and simultaneously capture the co-occurrence features from different joints, persons, and frames. Finally, we construct aggregated features of MCs (AFMCs) from MC memory units to better represent the intraframe action context encoding, and leverage a concurrent LSTM (Co-LSTM) to further model their temporal dynamics for action recognition. Our model significantly outperforms mainstream methods on NTU RGB+D 60/120 data set, mutual action subset of NTU RGB+D 60/120 data set, and Northewestern-UCLA data set.",

keywords = "3-D action recognition, Internet of Medical Things (IoMT), graph convolution, multilevel co-occurrence (MC), spatial attention (SA)",

author = "Shihao Xu and Haocong Rao and Hong Peng and Xin Jiang and Yi Guo and Xiping Hu and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2021",

month = nov,

day = "1",

doi = "10.1109/JIOT.2020.3042986",

language = "English",

volume = "8",

pages = "15990--16001",

journal = "IEEE Internet of Things Journal",

issn = "2327-4662",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "21",

}

TY - JOUR

T1 - Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

AU - Xu, Shihao

AU - Rao, Haocong

AU - Peng, Hong

AU - Jiang, Xin

AU - Guo, Yi

AU - Hu, Xiping

AU - Hu, Bin

PY - 2021/11/1

Y1 - 2021/11/1

N2 - Action recognition is essential for many human-centered applications in the Internet of Things (IoT). Especially, in the Internet of Medical Things (IoMT), action recognition shows great importance in surgical assistance, patient monitoring, etc. Recently, 3-D skeleton sequence-based action recognition draws broad attention. It is a challenging task that needs effective modeling on intraframe skeleton representations and interframe temporal dynamics. Standard long short-term memory (LSTM)-based models are widely used for sequence modeling due to its long-term memory, yet they are unable to fully model the relationship between different body joints or persons to extract crucial co-occurrence features from different levels. To handle this shortcoming, we propose an attention-based multilevel co-occurrence graph convolutional LSTM (AMCGC-LSTM). By integrating graph convolutional networks (GCNs) into LSTM, the proposed model is capable of leveraging body structural information from skeletons and strengthening the multilevel co-occurrence (MC) feature learning. Specifically, we first design the spatial attention module for feature enhancement of key joints from skeleton inputs. Second, we design MC memory units coupled with GCN to automatically model the spatial relationship between joints, and simultaneously capture the co-occurrence features from different joints, persons, and frames. Finally, we construct aggregated features of MCs (AFMCs) from MC memory units to better represent the intraframe action context encoding, and leverage a concurrent LSTM (Co-LSTM) to further model their temporal dynamics for action recognition. Our model significantly outperforms mainstream methods on NTU RGB+D 60/120 data set, mutual action subset of NTU RGB+D 60/120 data set, and Northewestern-UCLA data set.

AB - Action recognition is essential for many human-centered applications in the Internet of Things (IoT). Especially, in the Internet of Medical Things (IoMT), action recognition shows great importance in surgical assistance, patient monitoring, etc. Recently, 3-D skeleton sequence-based action recognition draws broad attention. It is a challenging task that needs effective modeling on intraframe skeleton representations and interframe temporal dynamics. Standard long short-term memory (LSTM)-based models are widely used for sequence modeling due to its long-term memory, yet they are unable to fully model the relationship between different body joints or persons to extract crucial co-occurrence features from different levels. To handle this shortcoming, we propose an attention-based multilevel co-occurrence graph convolutional LSTM (AMCGC-LSTM). By integrating graph convolutional networks (GCNs) into LSTM, the proposed model is capable of leveraging body structural information from skeletons and strengthening the multilevel co-occurrence (MC) feature learning. Specifically, we first design the spatial attention module for feature enhancement of key joints from skeleton inputs. Second, we design MC memory units coupled with GCN to automatically model the spatial relationship between joints, and simultaneously capture the co-occurrence features from different joints, persons, and frames. Finally, we construct aggregated features of MCs (AFMCs) from MC memory units to better represent the intraframe action context encoding, and leverage a concurrent LSTM (Co-LSTM) to further model their temporal dynamics for action recognition. Our model significantly outperforms mainstream methods on NTU RGB+D 60/120 data set, mutual action subset of NTU RGB+D 60/120 data set, and Northewestern-UCLA data set.

KW - 3-D action recognition

KW - Internet of Medical Things (IoMT)

KW - graph convolution

KW - multilevel co-occurrence (MC)

KW - spatial attention (SA)

UR - http://www.scopus.com/inward/record.url?scp=85097932603&partnerID=8YFLogxK

U2 - 10.1109/JIOT.2020.3042986

DO - 10.1109/JIOT.2020.3042986

M3 - Article

AN - SCOPUS:85097932603

SN - 2327-4662

VL - 8

SP - 15990

EP - 16001

JO - IEEE Internet of Things Journal

JF - IEEE Internet of Things Journal

IS - 21

ER -

Attention-Based Multilevel Co-Occurrence Graph Convolutional LSTM for 3-D Action Recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this