跳到主要导航 跳到搜索 跳到主要内容

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection

  • Yongqi Wang
  • , Xinxiao Wu
  • , Shuo Yang*
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Shenzhen MSU-BIT University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Open-vocabulary video visual relationship detection aims to detect objects and their relationships in videos without being restricted by predefined object or relationship categories. Existing methods leverage the rich semantic knowledge of pre-trained vision-language models such as CLIP to identify novel categories. They typically adopt a cascaded pipeline to first detect objects and then classify relationships based on the detected objects, which may lead to error propagation and thus suboptimal performance. In this paper, we propose Mutual EnhancemenT of Objects and Relationships (METOR), a query-based unified framework to jointly model and mutually enhance object detection and relationship classification in open-vocabulary scenarios. Under this framework, we first design a CLIP-based contextual refinement encoding module that extracts visual contexts of objects and relationships to refine the encoding of text features and object queries, thus improving the generalization of encoding to novel categories. Then we propose an iterative enhancement module to alternatively enhance the representations of objects and relationships by fully exploiting their interdependence to improve recognition performance. Extensive experiments on two public datasets, VidVRD and VidOR, demonstrate that our framework achieves state-of-the-art performance. Codes are at https://github.com/wangyongqi558/METOR.

源语言英语
主期刊名Proceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025
编辑James Kwok
出版商International Joint Conferences on Artificial Intelligence
2000-2008
页数9
ISBN(电子版)9781956792065
DOI
出版状态已出版 - 2025
已对外发布
活动34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025 - Montreal, 加拿大
期限: 16 8月 202522 8月 2025

出版系列

姓名IJCAI International Joint Conference on Artificial Intelligence
ISSN(印刷版)1045-0823

会议

会议34th Internationa Joint Conference on Artificial Intelligence, IJCAI 2025
国家/地区加拿大
Montreal
时期16/08/2522/08/25

指纹

探究 'METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此