Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition

Jie Chen*, Guoying Zhao, Vili Petteri Kellokumpu, Matti Pietikainen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

Automatic categorization of human actions in the real world is very challenging due to the great intra-class differences. In this paper, we present a new method for robust recognition of human actions. We first cluster each video in the training set into temporal semantic segments by a dense descriptor. Each segment in the training set is represented by a concatenated histogram of sparse and dense descriptors. These histograms of segments are used to train a classifier. In the recognition stage, a query video is also divided into temporal semantic segments by clustering. Each segment will obtain a confidence evaluated by the trained classifier. Combining the confidence of each segment, we classify this query video. To evaluate our approach, we perform experiments on two challenging datasets, i.e., the Olympic Sports Dataset (OSD) and Hollywood Human Action dataset (HOHA). We also test our method on the benchmark KTH human action dataset. Experimental results confirm that our algorithm performs better than the state-of-the-art methods.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
Pages1524-1531
Number of pages8
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011 - Barcelona, Spain
Duration: 6 Nov 201113 Nov 2011

Publication series

NameProceedings of the IEEE International Conference on Computer Vision

Conference

Conference2011 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2011
Country/TerritorySpain
CityBarcelona
Period6/11/1113/11/11

Fingerprint

Dive into the research topics of 'Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition'. Together they form a unique fingerprint.

Cite this