External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding

  • Jisheng Dang
  • , Huicheng Zheng*
  • , Xudong Wu
  • , Jingmei Jiao
  • , Bimei Wang*
  • , Jun Yang
  • , Bin Hu
  • , Jianhuang Lai
  • , Tat Seng Chua
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Long video understanding with Large Language Models (LLMs) enables the description of objects that are not explicitly present in the training data. However, continuous changes in known objects and the emergence of new ones require up-to-date knowledge of objects and their dynamics for effective understanding of the open world. To alleviate this, we propose an efficient Retrieval-Enhanced Video Understanding method, dubbed REVU, which leverages external knowledge to enhance the performance of open-world learning. First, REVU introduces an extensible external text-object memory with minimal text-visual mapping, involving static and dynamic multimodal information to help LLMs-based models align text and vision features. Second, REVU retrieves object information from external databases and dynamically integrates frame-specific data from videos, enabling effective knowledge aggregation to comprehend the open world. We conducted experiments on multiple benchmark datasets, and our model demonstrates strong adaptability to out-of-domain data without requiring additional fine-tuning or retraining. Experiments on benchmark video understanding datasets reveal that our model achieves state-of-the-art performance and robust generalization.

Original languageEnglish
Title of host publicationProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025
EditorsJames Kwok
PublisherInternational Joint Conferences on Artificial Intelligence
Pages864-872
Number of pages9
ISBN (Electronic)9781956792065
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025 - Montreal, Canada
Duration: 16 Aug 202522 Aug 2025

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025
Country/TerritoryCanada
CityMontreal
Period16/08/2522/08/25

Fingerprint

Dive into the research topics of 'External Memory Matters: Generalizable Object-Action Memory for Retrieval-Augmented Long-Term Video Understanding'. Together they form a unique fingerprint.

Cite this