TY - JOUR
T1 - A Hierarchical Video Description for Complex Activity Understanding
AU - Liu, Cuiwei
AU - Wu, Xinxiao
AU - Jia, Yunde
N1 - Publisher Copyright:
© 2016, Springer Science+Business Media New York.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.
AB - This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.
KW - Activity understanding
KW - Atomic action
KW - Hierarchical video description
KW - Latent structural model
UR - http://www.scopus.com/inward/record.url?scp=84961840030&partnerID=8YFLogxK
U2 - 10.1007/s11263-016-0897-2
DO - 10.1007/s11263-016-0897-2
M3 - Article
AN - SCOPUS:84961840030
SN - 0920-5691
VL - 118
SP - 240
EP - 255
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 2
ER -