Multi-Branch Spatial-Temporal Network for Action Recognition

Yingying Wang, Wei Li*, Ran Tao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Human action recognition based on deep-learning methods have received increasing attention and developed rapidly. However, current methods suffer from the confusion caused by convolving over time and space independently, processing shorter sequences, restricted to single temporal scale modeling and so on. The key objective of precisely classifying actions is to capture the appearance and motion throughout entire videos. Based on this purpose, a multi-branch spatial-temporal network (MSTN) is proposed. It consists of a multi-branch deep network and a long-term feature (LTF) layer. Benefits of the proposed MSTN include: (a) the multi-branch spatial-temporal network aims at encoding spatial and temporal information simultaneously, and (b) the LTF layer is used to aggregate the video-level representation with multiple temporal scales. Evaluations on two action datasets and comparison with several state-of-the-art approaches demonstrate the effectiveness of the proposed network.

Original languageEnglish
Article number8832232
Pages (from-to)1556-1560
Number of pages5
JournalIEEE Signal Processing Letters
Volume26
Issue number10
DOIs
Publication statusPublished - Oct 2019

Keywords

  • Action recognition
  • deep learning
  • long-term feature layer
  • spatial-temporal network

Fingerprint

Dive into the research topics of 'Multi-Branch Spatial-Temporal Network for Action Recognition'. Together they form a unique fingerprint.

Cite this