Action recognition using context and appearance distribution features

Xinxiao Wu; Dong Xu; Lixin Duan; Jiebo Luo

doi:10.1109/CVPR.2011.5995624

Action recognition using context and appearance distribution features

Xinxiao Wu^*, Dong Xu, Lixin Duan, Jiebo Luo

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

208 Citations (Scopus)

Abstract

We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

Original language	English
Title of host publication	2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011
Publisher	IEEE Computer Society
Pages	489-496
Number of pages	8
ISBN (Print)	9781457703942
DOIs	https://doi.org/10.1109/CVPR.2011.5995624
Publication status	Published - 2011
Externally published	Yes

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Access to Document

10.1109/CVPR.2011.5995624

Cite this

Wu, X., Xu, D., Duan, L., & Luo, J. (2011). Action recognition using context and appearance distribution features. In 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011 (pp. 489-496). Article 5995624 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE Computer Society. https://doi.org/10.1109/CVPR.2011.5995624

@inproceedings{b3b5d93c3a824f1699bf1383113ff414,

title = "Action recognition using context and appearance distribution features",

abstract = "We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.",

author = "Xinxiao Wu and Dong Xu and Lixin Duan and Jiebo Luo",

year = "2011",

doi = "10.1109/CVPR.2011.5995624",

language = "English",

isbn = "9781457703942",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "489--496",

booktitle = "2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011",

address = "United States",

}

Wu, X, Xu, D, Duan, L & Luo, J 2011, Action recognition using context and appearance distribution features. in 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011., 5995624, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, pp. 489-496. https://doi.org/10.1109/CVPR.2011.5995624

Action recognition using context and appearance distribution features. / Wu, Xinxiao; Xu, Dong; Duan, Lixin et al.
2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011. IEEE Computer Society, 2011. p. 489-496 5995624 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Action recognition using context and appearance distribution features

AU - Wu, Xinxiao

AU - Xu, Dong

AU - Duan, Lixin

AU - Luo, Jiebo

PY - 2011

Y1 - 2011

N2 - We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

AB - We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=80052908096&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2011.5995624

DO - 10.1109/CVPR.2011.5995624

M3 - Conference contribution

AN - SCOPUS:80052908096

SN - 9781457703942

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 489

EP - 496

BT - 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011

PB - IEEE Computer Society

ER -

Action recognition using context and appearance distribution features

Abstract

Publication series

Access to Document

Other files and links

Fingerprint

Cite this