Recognizing human actions from low-resolution videos by region-based mixture models

Ying Zhao; Huijun Di; Jian Zhang; Yao Lu; Feng Lv

doi:10.1109/ICME.2016.7552886

Recognizing human actions from low-resolution videos by region-based mixture models

Ying Zhao, Huijun Di, Jian Zhang, Yao Lu^*, Feng Lv

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

5 引用（Scopus）

摘要

Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.

源语言	英语
主期刊名	2016 IEEE International Conference on Multimedia and Expo, ICME 2016
出版商	IEEE Computer Society
ISBN（电子版）	9781467372589
DOI	https://doi.org/10.1109/ICME.2016.7552886
出版状态	已出版 - 25 8月 2016
活动	2016 IEEE International Conference on Multimedia and Expo, ICME 2016 - Seattle, 美国期限: 11 7月 2016 → 15 7月 2016

出版系列

姓名	Proceedings - IEEE International Conference on Multimedia and Expo
卷	2016-August
ISSN（印刷版）	1945-7871
ISSN（电子版）	1945-788X

会议

会议	2016 IEEE International Conference on Multimedia and Expo, ICME 2016
国家/地区	美国
市	Seattle
时期	11/07/16 → 15/07/16

访问文件

10.1109/ICME.2016.7552886

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhao, Y., Di, H., Zhang, J., Lu, Y., & Lv, F. (2016). Recognizing human actions from low-resolution videos by region-based mixture models. 在 2016 IEEE International Conference on Multimedia and Expo, ICME 2016 文章 7552886 (Proceedings - IEEE International Conference on Multimedia and Expo; 卷 2016-August). IEEE Computer Society. https://doi.org/10.1109/ICME.2016.7552886

@inproceedings{29095c7f0f7845b9b19f4cf9e014331a,

title = "Recognizing human actions from low-resolution videos by region-based mixture models",

abstract = "Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.",

keywords = "Action Recognition, Elastic Motion Tracking, Low-resolution(LR), Mixture Model",

author = "Ying Zhao and Huijun Di and Jian Zhang and Yao Lu and Feng Lv",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 2016 IEEE International Conference on Multimedia and Expo, ICME 2016 ; Conference date: 11-07-2016 Through 15-07-2016",

year = "2016",

month = aug,

day = "25",

doi = "10.1109/ICME.2016.7552886",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

booktitle = "2016 IEEE International Conference on Multimedia and Expo, ICME 2016",

address = "United States",

}

Zhao, Y, Di, H, Zhang, J, Lu, Y & Lv, F 2016, Recognizing human actions from low-resolution videos by region-based mixture models. 在 2016 IEEE International Conference on Multimedia and Expo, ICME 2016., 7552886, Proceedings - IEEE International Conference on Multimedia and Expo, 卷 2016-August, IEEE Computer Society, 2016 IEEE International Conference on Multimedia and Expo, ICME 2016, Seattle, 美国, 11/07/16. https://doi.org/10.1109/ICME.2016.7552886

Recognizing human actions from low-resolution videos by region-based mixture models. / Zhao, Ying; Di, Huijun; Zhang, Jian 等.
2016 IEEE International Conference on Multimedia and Expo, ICME 2016. IEEE Computer Society, 2016. 7552886 (Proceedings - IEEE International Conference on Multimedia and Expo; 卷 2016-August).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Recognizing human actions from low-resolution videos by region-based mixture models

AU - Zhao, Ying

AU - Di, Huijun

AU - Zhang, Jian

AU - Lu, Yao

AU - Lv, Feng

PY - 2016/8/25

Y1 - 2016/8/25

N2 - Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.

AB - Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.

KW - Action Recognition

KW - Elastic Motion Tracking

KW - Low-resolution(LR)

KW - Mixture Model

UR - http://www.scopus.com/inward/record.url?scp=84987624525&partnerID=8YFLogxK

U2 - 10.1109/ICME.2016.7552886

DO - 10.1109/ICME.2016.7552886

M3 - Conference contribution

AN - SCOPUS:84987624525

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

BT - 2016 IEEE International Conference on Multimedia and Expo, ICME 2016

PB - IEEE Computer Society

T2 - 2016 IEEE International Conference on Multimedia and Expo, ICME 2016

Y2 - 11 July 2016 through 15 July 2016

ER -

Recognizing human actions from low-resolution videos by region-based mixture models

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此