TY - JOUR
T1 - Temporal-visual proposal graph network for temporal action detection
AU - Gan, Ming Gang
AU - Zhang, Yan
AU - Su, Shaowen
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/11
Y1 - 2023/11
N2 - Temporal action detection is usually divided into two stages: temporal action proposal generation and proposal classification. Most methods consider the proposal classification stage as an action recognition task. However, compared with trimmed videos, proposals generally contain part of the ground-truth action, lacking enough semantic information to predict their categories precisely. In this paper, we propose a novel temporal-visual proposal graph (TVPG) module to acquire sufficient semantic information for action proposal classification. The module first adopts a proposal graph construction strategy to select valuable neighbor proposals for each proposal and constructs them into an action proposal graph. Then, it applies a temporal graph convolution network and a visual graph convolution network in parallel on the graph to improve proposal feature quality by obtaining action information from neighbors. In the temporal graph convolution network, we design a novel temporal graph convolution operation that embeds temporal position relation information into proposal features and extracts the information from other proposals by temporal position relations. Based on the TVPG module, we construct an action proposal classification model named the temporal-visual proposal graph network (TVPGN) and perform extensive experiments on two benchmarks. The results show that TVPGN achieves competitive performance on both datasets.
AB - Temporal action detection is usually divided into two stages: temporal action proposal generation and proposal classification. Most methods consider the proposal classification stage as an action recognition task. However, compared with trimmed videos, proposals generally contain part of the ground-truth action, lacking enough semantic information to predict their categories precisely. In this paper, we propose a novel temporal-visual proposal graph (TVPG) module to acquire sufficient semantic information for action proposal classification. The module first adopts a proposal graph construction strategy to select valuable neighbor proposals for each proposal and constructs them into an action proposal graph. Then, it applies a temporal graph convolution network and a visual graph convolution network in parallel on the graph to improve proposal feature quality by obtaining action information from neighbors. In the temporal graph convolution network, we design a novel temporal graph convolution operation that embeds temporal position relation information into proposal features and extracts the information from other proposals by temporal position relations. Based on the TVPG module, we construct an action proposal classification model named the temporal-visual proposal graph network (TVPGN) and perform extensive experiments on two benchmarks. The results show that TVPGN achieves competitive performance on both datasets.
KW - Action proposal classification
KW - Action proposal graph
KW - Graph convolution
KW - Temporal action detection
UR - http://www.scopus.com/inward/record.url?scp=85168126193&partnerID=8YFLogxK
U2 - 10.1007/s10489-023-04947-0
DO - 10.1007/s10489-023-04947-0
M3 - Article
AN - SCOPUS:85168126193
SN - 0924-669X
VL - 53
SP - 26008
EP - 26026
JO - Applied Intelligence
JF - Applied Intelligence
IS - 21
ER -