AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization

Xiao Yu Zhang, Changsheng Li, Haichao Shi*, Xiaobin Zhu, Peng Li, Jing Dong

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

35 Citations (Scopus)

Abstract

The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos have attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. However, unconstrained knowledge transfer may bring about irrelevant noise and jeopardize the learning model. This article proposes a novel adaptability decomposing encoder-decoder network to transfer reliable knowledge between the trimmed and untrimmed videos for action recognition and localization by bidirectional point process modeling, given only video-level annotations. By decomposing the original features into the domain-adaptable and domain-specific ones based on their adaptability, trimmed-untrimmed knowledge transfer can be safely confined within a more coherent subspace. An encoder-decoder-based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization. Extensive experiments are conducted on two benchmark data sets (i.e., THUMOS14 and ActivityNet1.3), and the experimental results clearly corroborate the efficacy of our method.

Original languageEnglish
Pages (from-to)1852-1863
Number of pages12
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume34
Issue number4
DOIs
Publication statusPublished - 1 Apr 2023

Keywords

  • Action recognition
  • encodera-decoder
  • knowledge transfer
  • point process
  • temporal action localization

Fingerprint

Dive into the research topics of 'AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization'. Together they form a unique fingerprint.

Cite this