Weakly-Supervised Movie Trailer Generation Driven by Multi-Modal Semantic Consistency

  • Sidan Zhu
  • , Yutong Wang
  • , Hongteng Xu
  • , Dixin Luo*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As an essential movie promotional tool, trailers are designed to capture the audience's interest through the skillful editing of key movie shots. Although some attempts have been made for automatic trailer generation, existing methods often rely on predefined rules or manual fine-grained annotations and fail to fully leverage the multi-modal information of movies, resulting in unsatisfactory trailer generation results. In this study, we introduce a weakly-supervised trailer generation method driven by multi-modal semantic consistency. Specifically, we design a multi-modal trailer generation framework that selects and sorts key movie shots based on input music and movie metadata (e.g., category tags and plot keywords) and adds narration to the generated trailer based on movie subtitles. We utilize two pseudo-scores derived from the proposed framework as labels and thus train the model under a weakly-supervised learning paradigm, ensuring trailerness consistency for key shot selection and emotion consistency for key shot sorting, respectively. As a result, we can learn the proposed model solely based on movie-trailer pairs without any fine-grained annotations. Both objective experimental results and subjective user studies demonstrate the superior performance of our method over previous works. The code is available at https://github.com/Dixin-Lab/MMSC.

Original languageEnglish
Title of host publicationProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025
EditorsJames Kwok
PublisherInternational Joint Conferences on Artificial Intelligence
Pages10234-10242
Number of pages9
ISBN (Electronic)9781956792065
DOIs
Publication statusPublished - 2025
Event34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025 - Montreal, Canada
Duration: 16 Aug 202522 Aug 2025

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025
Country/TerritoryCanada
CityMontreal
Period16/08/2522/08/25

Fingerprint

Dive into the research topics of 'Weakly-Supervised Movie Trailer Generation Driven by Multi-Modal Semantic Consistency'. Together they form a unique fingerprint.

Cite this