Action recognition with bootstrapping based long-range temporal context attention

Ziming Liu, Guangyu Gao*, A. K. Qin, Tong Wu, Chi Harold Liu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

7 引用 (Scopus)

摘要

Actions always refer to complex vision variations in a long-range redundant video sequence. Instead of focusing on limited range sequence, i.e. convolution on adjacent frames, in this paper, we proposed an action recognition approach with bootstrapping based long-range temporal context attention. Specifically, due to vision variations of the local region across frames, we target at capturing temporal context by proposing the Temporal Pixels based Parallel-head Attention (TPPA) block. In TPPA, we apply the self-attention mechanism between local regions at the same position across temporal frames to capture the interaction impacts. Meanwhile, to deal with video redundancy and capture long-range context, the TPPA is extended to the Random Frames based Bootstrapping Attention (RFBA) framework. While the bootstrapping sampling frames have the same distribution of the whole video sequence, the RFBA not only captures longer temporal context with only a few sampling frames but also has comprehensive representation through multiple sampling. Furthermore, we also try to apply this temporal context attention to image-based action recognition, by transforming the image into “pseudo video“with the spatial shift. Finally, we conduct extensive experiments and empirical evaluations on two most popular datasets: UCF101 for videos and Stanford40 for images. In particular, our approach achieves top-1 accuracy of 91.7% in UCF101 and mAP of 90.9% in Stanford40.

源语言英语
主期刊名MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
583-591
页数9
ISBN(电子版)9781450368896
DOI
出版状态已出版 - 15 10月 2019
活动27th ACM International Conference on Multimedia, MM 2019 - Nice, 法国
期限: 21 10月 201925 10月 2019

出版系列

姓名MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia

会议

会议27th ACM International Conference on Multimedia, MM 2019
国家/地区法国
Nice
时期21/10/1925/10/19

指纹

探究 'Action recognition with bootstrapping based long-range temporal context attention' 的科研主题。它们共同构成独一无二的指纹。

引用此