跳到主要导航 跳到搜索 跳到主要内容

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

  • Chuanhao Li
  • , Zhen Li
  • , Chenchen Jing*
  • , Yuwei Wu*
  • , Mingliang Zhai
  • , Yunde Jia
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Shenzhen MSU-BIT University
  • Zhejiang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Compositional generalization has received much attention in vision-and-language and visual reasoning recently. Substitutivity, the capability to generalize to novel compositions with synonymous primitives such as words and visual entities, is an essential factor in evaluating the compositional generalization ability but remains largely unexplored. In this paper, we explore the compositional substitutivity of visual reasoning in the context of visual question answering (VQA). We propose a training framework for VQA models to maintain compositional substitutivity. The basic idea is to learn invariant representations for synonymous primitives via support-sets. Specifically, for each question-image pair, we construct a support question set and a support image set, and both sets contain questions/images that share synonymous primitives with the original question/image. By enforcing a VQA model to reconstruct the original question/image with the sets, the model is able to identify which primitives are synonymous. To quantitatively evaluate the substitutivity of VQA models, we introduce two datasets: GQA-SPS and VQA-SPS v2, by performing three types of substitutions using synonymous primitives including words, visual entities, and referents. Experimental results demonstrate the effectiveness of our framework. We release GQA-SPS and VQA-SPS v2 at https://github.com/NeverMoreLCH/CG-SPS.

源语言英语
主期刊名Computer Vision – ECCV 2024 - 18th European Conference, Proceedings
编辑Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
出版商Springer Science and Business Media Deutschland GmbH
143-160
页数18
ISBN(印刷版)9783031731945
DOI
出版状态已出版 - 2025
活动18th European Conference on Computer Vision, ECCV 2024 - Milan, 意大利
期限: 29 9月 20244 10月 2024

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
15106 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议18th European Conference on Computer Vision, ECCV 2024
国家/地区意大利
Milan
时期29/09/244/10/24

指纹

探究 'Compositional Substitutivity of Visual Reasoning for Visual Question Answering' 的科研主题。它们共同构成独一无二的指纹。

引用此