跳到主要导航 跳到搜索 跳到主要内容

HM-RAG: Long video reasoning and anomaly detection via hierarchical multi-agent retrieval-augmented generation

  • Jisheng Dang
  • , Quan Wan
  • , Dewei Liu
  • , Ziyue Wang
  • , Bimei Wang*
  • , Pei Liu
  • , Hong Peng
  • , Bin Hu
  • , Tat Seng Chua
  • *此作品的通讯作者
  • Lanzhou University
  • Hong Kong University of Science and Technology
  • Beijing Institute of Technology
  • National University of Singapore

科研成果: 期刊稿件文章同行评审

摘要

Multimodal video reasoning and anomaly detection remain key challenges for Large Language Models (LLMs) due to limited video-text alignment, narrow knowledge coverage, and difficulties in handling complex or weakly video-related queries. To address these limitations, we propose a Hierarchical Multi-Agent Retrieval-Augmented Generation (HM-RAG) framework that integrates internal temporal understanding with external knowledge retrieval. Specifically, our approach operates through a coordinated pipeline: it begins with a question decomposition agent that reformulates complex queries into structured sub-tasks, followed by multi-source reasoning agents, comprising a web agent for external retrieval and a memory-enhanced model for long-range temporal dependencies. Finally, decision agent synthesizes these multi-source insights to resolve contradictions and generate precise predictions. By hierarchically coordinating agents across retrieval and reasoning modalities, our framework achieves effective knowledge fusion. Extensive evaluations demonstrate that HM-RAG significantly improves performance not only on standard multimodal video reasoning benchmarks but also effectively identifies irregular events, validating its robustness in video anomaly detection tasks. Code is available at https://github.com/hanzif1/HM-RAG.

源语言英语
文章编号113622
期刊Pattern Recognition
179
DOI
出版状态已出版 - 11月 2026
已对外发布

指纹

探究 'HM-RAG: Long video reasoning and anomaly detection via hierarchical multi-agent retrieval-augmented generation' 的科研主题。它们共同构成独一无二的指纹。

引用此