Recognizing actions in images by fusing multiple body structure cues

Yang Li, Kan Li*, Xinxin Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)

Abstract

Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.

Original languageEnglish
Article number107341
JournalPattern Recognition
Volume104
DOIs
Publication statusPublished - Aug 2020

Keywords

  • Body structure cues
  • Convolutional neural network
  • Image-based action recognition

Fingerprint

Dive into the research topics of 'Recognizing actions in images by fusing multiple body structure cues'. Together they form a unique fingerprint.

Cite this