融合语义信息的视频摘要生成

Rui Hua, Xinxiao Wu*, Wentian Zhao

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Video summarization aims to generate short and compact summary to represent original video. However, the existing methods focus more on representativeness and diversity of representation, but less on semantic information. In order to fully exploit semantic information of video content, we propose a novel video summarization model that learns a visual-semantic embedding space, so that the video features contain rich semantic information. It can generate video summaries and text summaries that describe the original video simultaneously. The model is mainly divided into three modules: frame-level score weighting module that combines convolutional layers and fully connected layers; visual-semantic embedding module that embeds the video and text in a common embedding space and make them lose to each other to achieve the purpose of mutual promotion of two features; video caption generation module that generates video summary with semantic information by minimizing the distance between the generated description of the video summary and the manually annotated text of the original video. During the test, while obtaining the video summary, we obtain a short text summary as a by-product, which can help people understand the video content more intuitively. Experiments on SumMe and TVSum datasets show that the proposed model achieves better performance than the existing advanced methods by fusing semantic information, and improves F-score by 0.5% and 1.6%, respectively.

投稿的翻译标题Video summarization by learning semantic information
源语言繁体中文
页(从-至)650-657
页数8
期刊Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics
47
3
DOI
出版状态已出版 - 3月 2021

关键词

  • Long Short-Term Memory (LSTM) model
  • Video captioning
  • Video key frame
  • Video summarization
  • Visual-semantic embedding space

指纹

探究 '融合语义信息的视频摘要生成' 的科研主题。它们共同构成独一无二的指纹。

引用此