SHTVS: Shot-level based Hierarchical Transformer for Video Summarization

Yubo An, Shenghui Zhao*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

In this paper, a Shot-level based Hierarchical Transformer for Video Summarization (SHTVS) is proposed for supervised video summarization. Different from most existing methods that employ bidirectional long short-term memory or use self-attention to replace certain components while keeping their overall structure in place, our methods show that a pure Transformer with video feature sequences as its input can achieve competitive performance in video summarization. In addition, to make better use of the multi-shot characteristic in a video, each video feature sequence is firstly split into shot-level feature sequences with kernel temporal segmentation, and then fed into shot-level Transformer encoder to learn shot-level representations. Finally, shot-level representations and original video feature sequence are integrated for the frame-level Transformer encoder to predict frame-level importance scores. Extensive experimental results on two benchmark datasets (SumMe and TVSum) prove the effectiveness of our methods.

源语言英语
主期刊名ICIGP 2022 - Proceedings of the 2022 5th International Conference on Image and Graphics Processing
出版商Association for Computing Machinery
268-274
页数7
ISBN(电子版)9781450395465
DOI
出版状态已出版 - 7 1月 2022
活动5th International Conference on Image and Graphics Processing, ICIGP 2022 - Virtual, Online, 中国
期限: 7 1月 20229 1月 2022

出版系列

姓名ACM International Conference Proceeding Series

会议

会议5th International Conference on Image and Graphics Processing, ICIGP 2022
国家/地区中国
Virtual, Online
时期7/01/229/01/22

指纹

探究 'SHTVS: Shot-level based Hierarchical Transformer for Video Summarization' 的科研主题。它们共同构成独一无二的指纹。

引用此