摘要
Achieving joint segmentation and recognition of continuous actions in a long-term video is a challenging task due to the varying durations of actions and the complex transitions of multiple actions. In this paper, a novel discriminative structural model is proposed for splitting a long-term video into segments and annotating the action label of each segment. A set of state variables is introduced into the model to explore discriminative semantic concepts shared among different actions. To exploit the statistical dependences among segments, temporal context is captured at both the action level and the semantic concept level. The state variables are treated as latent information in the discriminative structural model and inferred during both training and testing. Experiments on multi-view IXMAS and realistic Hollywood datasets demonstrate the effectiveness of the proposed method.
源语言 | 英语 |
---|---|
页(从-至) | 31627-31645 |
页数 | 19 |
期刊 | Multimedia Tools and Applications |
卷 | 77 |
期 | 24 |
DOI | |
出版状态 | 已出版 - 1 12月 2018 |