Unsupervised deep learning of mid-level video representation for action recognition

Jingyi Hou, Xinxiao Wu, Jin Chen, Jiebo Luo, Yunde Jia

科研成果: 书/报告/会议事项章节会议稿件同行评审

8 引用 (Scopus)

摘要

Current deep learning methods for action recognition rely heavily on large scale labeled video datasets. Manually annotating video datasets is laborious and may introduce unexpected bias to train complex deep models for learning video representation. In this paper, we propose an unsupervised deep learning method which employs unlabeled local spatial-temporal volumes extracted from action videos to learn mid-level video representation for action recognition. Specifically, our method simultaneously discovers mid-level semantic concepts by discriminative clustering and optimizes local spatial-temporal features by two relatively small and simple deep neural networks. The clustering generates semantic visual concepts that guide the training of the deep networks, and the networks in turn guarantee the robustness of the semantic concepts. Experiments on the HMDB51 and the UCF101 datasets demonstrate the superiority of the proposed method, even over several supervised learning methods.

源语言英语
主期刊名32nd AAAI Conference on Artificial Intelligence, AAAI 2018
出版商AAAI press
6910-6917
页数8
ISBN(电子版)9781577358008
出版状态已出版 - 2018
活动32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, 美国
期限: 2 2月 20187 2月 2018

出版系列

姓名32nd AAAI Conference on Artificial Intelligence, AAAI 2018

会议

会议32nd AAAI Conference on Artificial Intelligence, AAAI 2018
国家/地区美国
New Orleans
时期2/02/187/02/18

指纹

探究 'Unsupervised deep learning of mid-level video representation for action recognition' 的科研主题。它们共同构成独一无二的指纹。

引用此