Abstract
State-of-the-art performance in human action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, optical flow algorithms are far from perfect in low-resolution (LR) videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points (SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM encodes the spatial layout of features without any needs of body parts segmentation. Experimental results show that the approach is effective and, more importantly, the approach is more general for LR recognition tasks.
Original language | English |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | Neurocomputing |
Volume | 247 |
DOIs | |
Publication status | Published - 19 Jul 2017 |
Keywords
- Action recognition
- Elastic motion tracking
- Expectation Maximization (EM) algorithm
- Low-resolution
- Mixture model