Video Summarization Using Denoising Diffusion Probabilistic Model

Zirui Shang, Yubo Zhu, Hongxi Li, Shuo Yang, Xinxiao Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Video summarization aims to eliminate visual redundancy while retaining key parts of video to construct concise and comprehensive synopses. Most existing methods use discriminative models to predict the importance scores of video frames. However, these methods are susceptible to annotation inconsistency caused by the inherent subjectivity of different annotators when annotating the same video. In this paper, we introduce a generative framework for video summarization that learns how to generate summaries from a probability distribution perspective, effectively reducing the interference of subjective annotation noise. Specifically, we propose a novel diffusion summarization method based on the Denoising Diffusion Probabilistic Model (DDPM), which learns the probability distribution of training data through noise prediction, and generates summaries by iterative denoising. Our method is more resistant to subjective annotation noise, and is less prone to overfitting the training data than discriminative methods, with strong generalization ability. Moreover, to facilitate training DDPM with limited data, we employ an unsupervised video summarization model to implement the earlier denoising process. Extensive experiments on various datasets (TVSum, SumMe, and FPVSum) demonstrate the effectiveness of our method.

Original languageEnglish
Title of host publicationSpecial Track on AI Alignment
EditorsToby Walsh, Julie Shah, Zico Kolter
PublisherAssociation for the Advancement of Artificial Intelligence
Pages6776-6784
Number of pages9
Edition7
ISBN (Electronic)157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 157735897X, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978, 9781577358978
DOIs
Publication statusPublished - 11 Apr 2025
Externally publishedYes
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
Number7
Volume39
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Country/TerritoryUnited States
CityPhiladelphia
Period25/02/254/03/25

Fingerprint

Dive into the research topics of 'Video Summarization Using Denoising Diffusion Probabilistic Model'. Together they form a unique fingerprint.

Cite this