Recognizing actions in images by fusing multiple body structure cues

Yang Li; Kan Li; Xinxin Wang

doi:10.1016/j.patcog.2020.107341

Recognizing actions in images by fusing multiple body structure cues

Yang Li, Kan Li^*, Xinxin Wang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

17 引用（Scopus）

摘要

Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.

源语言	英语
文章编号	107341
期刊	Pattern Recognition
卷	104
DOI	https://doi.org/10.1016/j.patcog.2020.107341
出版状态	已出版 - 8月 2020

访问文件

10.1016/j.patcog.2020.107341

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c0546296729b4b079b4cb03bfb44af1c,

title = "Recognizing actions in images by fusing multiple body structure cues",

abstract = "Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.",

keywords = "Body structure cues, Convolutional neural network, Image-based action recognition",

author = "Yang Li and Kan Li and Xinxin Wang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Ltd",

year = "2020",

month = aug,

doi = "10.1016/j.patcog.2020.107341",

language = "English",

volume = "104",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Recognizing actions in images by fusing multiple body structure cues

AU - Li, Yang

AU - Li, Kan

AU - Wang, Xinxin

PY - 2020/8

Y1 - 2020/8

N2 - Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.

AB - Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.

KW - Body structure cues

KW - Convolutional neural network

KW - Image-based action recognition

UR - http://www.scopus.com/inward/record.url?scp=85083185409&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2020.107341

DO - 10.1016/j.patcog.2020.107341

M3 - Article

AN - SCOPUS:85083185409

SN - 0031-3203

VL - 104

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 107341

ER -

Recognizing actions in images by fusing multiple body structure cues

摘要

访问文件

其它文件与链接

指纹

引用此