TY - GEN
T1 - Recognizing human actions from low-resolution videos by region-based mixture models
AU - Zhao, Ying
AU - Di, Huijun
AU - Zhang, Jian
AU - Lu, Yao
AU - Lv, Feng
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/25
Y1 - 2016/8/25
N2 - Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.
AB - Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.
KW - Action Recognition
KW - Elastic Motion Tracking
KW - Low-resolution(LR)
KW - Mixture Model
UR - http://www.scopus.com/inward/record.url?scp=84987624525&partnerID=8YFLogxK
U2 - 10.1109/ICME.2016.7552886
DO - 10.1109/ICME.2016.7552886
M3 - Conference contribution
AN - SCOPUS:84987624525
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2016 IEEE International Conference on Multimedia and Expo, ICME 2016
PB - IEEE Computer Society
T2 - 2016 IEEE International Conference on Multimedia and Expo, ICME 2016
Y2 - 11 July 2016 through 15 July 2016
ER -