TY - JOUR
T1 - Recognizing actions in images by fusing multiple body structure cues
AU - Li, Yang
AU - Li, Kan
AU - Wang, Xinxin
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/8
Y1 - 2020/8
N2 - Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.
AB - Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field.
KW - Body structure cues
KW - Convolutional neural network
KW - Image-based action recognition
UR - http://www.scopus.com/inward/record.url?scp=85083185409&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2020.107341
DO - 10.1016/j.patcog.2020.107341
M3 - Article
AN - SCOPUS:85083185409
SN - 0031-3203
VL - 104
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 107341
ER -