TY - GEN
T1 - Dynamic Skeleton Association Transformer for Dyadic Interaction Action Recognition
AU - Liu, Zixian
AU - Zhang, Longfei
AU - Zhao, Xiaokun
AU - Wang, Yixuan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Since GCN has been proposed to represent skeleton data as graphs, it has always been the primary method for skeleton-based human action recognition. However, when dealing with interaction skeleton sequences, current GCN-based methods do not consider dynamically updating the connections between the skeleton points of two persons and cannot extract interaction features well. The self-attention module of Transformer can well focus on the correlation between skeleton sequences. We propose a novel method called Dynamic Skeleton Association Transformer (DSAT) for dyadic interaction action recognition, which can dynamically update the interaction relationship adjacency matrix by combining the spatial attention features and geometric spatial distances of two skeleton sequences to capture the spatial interaction relationship between the skeleton sequence of the two persons. Then, we use spatial self-attention to extract the interaction relationships between different individuals and within the same individual. We also improve the temporal self-attention module according to the density of interactive events to extract the correlation between the same skeleton point in different frames. Through our strategy, our model can more effectively recognize interactive behaviors that are density in time and space, and we have conducted extensive experiments on the benchmark datasets of SBU, NTU-RGB+D, and NTU-RGB+D 120 interaction subsets to verify the effectiveness of our method.
AB - Since GCN has been proposed to represent skeleton data as graphs, it has always been the primary method for skeleton-based human action recognition. However, when dealing with interaction skeleton sequences, current GCN-based methods do not consider dynamically updating the connections between the skeleton points of two persons and cannot extract interaction features well. The self-attention module of Transformer can well focus on the correlation between skeleton sequences. We propose a novel method called Dynamic Skeleton Association Transformer (DSAT) for dyadic interaction action recognition, which can dynamically update the interaction relationship adjacency matrix by combining the spatial attention features and geometric spatial distances of two skeleton sequences to capture the spatial interaction relationship between the skeleton sequence of the two persons. Then, we use spatial self-attention to extract the interaction relationships between different individuals and within the same individual. We also improve the temporal self-attention module according to the density of interactive events to extract the correlation between the same skeleton point in different frames. Through our strategy, our model can more effectively recognize interactive behaviors that are density in time and space, and we have conducted extensive experiments on the benchmark datasets of SBU, NTU-RGB+D, and NTU-RGB+D 120 interaction subsets to verify the effectiveness of our method.
KW - Skeleton-based interaction recognition
KW - Transformer
KW - action recognition
KW - graph convolutional networks
UR - http://www.scopus.com/inward/record.url?scp=85209176104&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-8511-7_39
DO - 10.1007/978-981-97-8511-7_39
M3 - Conference contribution
AN - SCOPUS:85209176104
SN - 9789819785100
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 554
EP - 569
BT - Pattern Recognition and Computer Vision - 7th Chinese Conference, PRCV 2024, Proceedings
A2 - Lin, Zhouchen
A2 - Zha, Hongbin
A2 - Cheng, Ming-Ming
A2 - He, Ran
A2 - Liu, Cheng-Lin
A2 - Ubul, Kurban
A2 - Silamu, Wushouer
A2 - Zhou, Jie
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2024
Y2 - 18 October 2024 through 20 October 2024
ER -