TY - GEN
T1 - Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition
AU - Chen, Jie
AU - Zhao, Guoying
AU - Kellokumpu, Vili Petteri
AU - Pietikainen, Matti
PY - 2011
Y1 - 2011
N2 - Automatic categorization of human actions in the real world is very challenging due to the great intra-class differences. In this paper, we present a new method for robust recognition of human actions. We first cluster each video in the training set into temporal semantic segments by a dense descriptor. Each segment in the training set is represented by a concatenated histogram of sparse and dense descriptors. These histograms of segments are used to train a classifier. In the recognition stage, a query video is also divided into temporal semantic segments by clustering. Each segment will obtain a confidence evaluated by the trained classifier. Combining the confidence of each segment, we classify this query video. To evaluate our approach, we perform experiments on two challenging datasets, i.e., the Olympic Sports Dataset (OSD) and Hollywood Human Action dataset (HOHA). We also test our method on the benchmark KTH human action dataset. Experimental results confirm that our algorithm performs better than the state-of-the-art methods.
AB - Automatic categorization of human actions in the real world is very challenging due to the great intra-class differences. In this paper, we present a new method for robust recognition of human actions. We first cluster each video in the training set into temporal semantic segments by a dense descriptor. Each segment in the training set is represented by a concatenated histogram of sparse and dense descriptors. These histograms of segments are used to train a classifier. In the recognition stage, a query video is also divided into temporal semantic segments by clustering. Each segment will obtain a confidence evaluated by the trained classifier. Combining the confidence of each segment, we classify this query video. To evaluate our approach, we perform experiments on two challenging datasets, i.e., the Olympic Sports Dataset (OSD) and Hollywood Human Action dataset (HOHA). We also test our method on the benchmark KTH human action dataset. Experimental results confirm that our algorithm performs better than the state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=84863082924&partnerID=8YFLogxK
U2 - 10.1109/ICCVW.2011.6130431
DO - 10.1109/ICCVW.2011.6130431
M3 - Conference contribution
AN - SCOPUS:84863082924
SN - 9781467300629
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 1524
EP - 1531
BT - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
T2 - 2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
Y2 - 6 November 2011 through 13 November 2011
ER -