融合语义信息的视频摘要生成

Translated title of the contribution: Video summarization by learning semantic information

Rui Hua, Xinxiao Wu*, Wentian Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Video summarization aims to generate short and compact summary to represent original video. However, the existing methods focus more on representativeness and diversity of representation, but less on semantic information. In order to fully exploit semantic information of video content, we propose a novel video summarization model that learns a visual-semantic embedding space, so that the video features contain rich semantic information. It can generate video summaries and text summaries that describe the original video simultaneously. The model is mainly divided into three modules: frame-level score weighting module that combines convolutional layers and fully connected layers; visual-semantic embedding module that embeds the video and text in a common embedding space and make them lose to each other to achieve the purpose of mutual promotion of two features; video caption generation module that generates video summary with semantic information by minimizing the distance between the generated description of the video summary and the manually annotated text of the original video. During the test, while obtaining the video summary, we obtain a short text summary as a by-product, which can help people understand the video content more intuitively. Experiments on SumMe and TVSum datasets show that the proposed model achieves better performance than the existing advanced methods by fusing semantic information, and improves F-score by 0.5% and 1.6%, respectively.

Translated title of the contributionVideo summarization by learning semantic information
Original languageChinese (Traditional)
Pages (from-to)650-657
Number of pages8
JournalBeijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics
Volume47
Issue number3
DOIs
Publication statusPublished - Mar 2021

Fingerprint

Dive into the research topics of 'Video summarization by learning semantic information'. Together they form a unique fingerprint.

Cite this