Deeply-supervised CNN model for action recognition with trainable feature aggregation

Yang Li, Kan Li*, Xinxin Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.

Original languageEnglish
Title of host publicationProceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
EditorsJerome Lang
PublisherInternational Joint Conferences on Artificial Intelligence
Pages807-813
Number of pages7
ISBN (Electronic)9780999241127
DOIs
Publication statusPublished - 2018
Event27th International Joint Conference on Artificial Intelligence, IJCAI 2018 - Stockholm, Sweden
Duration: 13 Jul 201819 Jul 2018

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2018-July
ISSN (Print)1045-0823

Conference

Conference27th International Joint Conference on Artificial Intelligence, IJCAI 2018
Country/TerritorySweden
CityStockholm
Period13/07/1819/07/18

Fingerprint

Dive into the research topics of 'Deeply-supervised CNN model for action recognition with trainable feature aggregation'. Together they form a unique fingerprint.

Cite this