A Hierarchical Video Description for Complex Activity Understanding

Cuiwei Liu; Xinxiao Wu; Yunde Jia

doi:10.1007/s11263-016-0897-2

A Hierarchical Video Description for Complex Activity Understanding

Cuiwei Liu, Xinxiao Wu^*, Yunde Jia

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

14 引用（Scopus）

摘要

This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.

源语言	英语
页（从-至）	240-255
页数	16
期刊	International Journal of Computer Vision
卷	118
期	2
DOI	https://doi.org/10.1007/s11263-016-0897-2
出版状态	已出版 - 1 6月 2016

访问文件

10.1007/s11263-016-0897-2

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b024fcca2ea540809132547d08b3cdab,

title = "A Hierarchical Video Description for Complex Activity Understanding",

abstract = "This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.",

keywords = "Activity understanding, Atomic action, Hierarchical video description, Latent structural model",

author = "Cuiwei Liu and Xinxiao Wu and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} 2016, Springer Science+Business Media New York.",

year = "2016",

month = jun,

day = "1",

doi = "10.1007/s11263-016-0897-2",

language = "English",

volume = "118",

pages = "240--255",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Netherlands",

number = "2",

}

TY - JOUR

T1 - A Hierarchical Video Description for Complex Activity Understanding

AU - Liu, Cuiwei

AU - Wu, Xinxiao

AU - Jia, Yunde

PY - 2016/6/1

Y1 - 2016/6/1

N2 - This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.

AB - This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.

KW - Activity understanding

KW - Atomic action

KW - Hierarchical video description

KW - Latent structural model

UR - http://www.scopus.com/inward/record.url?scp=84961840030&partnerID=8YFLogxK

U2 - 10.1007/s11263-016-0897-2

DO - 10.1007/s11263-016-0897-2

M3 - Article

AN - SCOPUS:84961840030

SN - 0920-5691

VL - 118

SP - 240

EP - 255

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

IS - 2

ER -

A Hierarchical Video Description for Complex Activity Understanding

摘要

访问文件

其它文件与链接

指纹

引用此