Temporal-visual proposal graph network for temporal action detection

Ming Gang Gan, Yan Zhang*, Shaowen Su

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Temporal action detection is usually divided into two stages: temporal action proposal generation and proposal classification. Most methods consider the proposal classification stage as an action recognition task. However, compared with trimmed videos, proposals generally contain part of the ground-truth action, lacking enough semantic information to predict their categories precisely. In this paper, we propose a novel temporal-visual proposal graph (TVPG) module to acquire sufficient semantic information for action proposal classification. The module first adopts a proposal graph construction strategy to select valuable neighbor proposals for each proposal and constructs them into an action proposal graph. Then, it applies a temporal graph convolution network and a visual graph convolution network in parallel on the graph to improve proposal feature quality by obtaining action information from neighbors. In the temporal graph convolution network, we design a novel temporal graph convolution operation that embeds temporal position relation information into proposal features and extracts the information from other proposals by temporal position relations. Based on the TVPG module, we construct an action proposal classification model named the temporal-visual proposal graph network (TVPGN) and perform extensive experiments on two benchmarks. The results show that TVPGN achieves competitive performance on both datasets.

Original languageEnglish
Pages (from-to)26008-26026
Number of pages19
JournalApplied Intelligence
Volume53
Issue number21
DOIs
Publication statusPublished - Nov 2023

Keywords

  • Action proposal classification
  • Action proposal graph
  • Graph convolution
  • Temporal action detection

Fingerprint

Dive into the research topics of 'Temporal-visual proposal graph network for temporal action detection'. Together they form a unique fingerprint.

Cite this