Let Me Show You Step by Step: An Interpretable Graph Routing Network for Knowledge-based Visual Question Answering

Duokang Wang, Linmei Hu*, Rui Hao, Yingxia Shao, Xin Lv, Liqiang Nie, Juanzi Li

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Visual Question Answering based on external Knowledge Bases (KB-VQA) requires a model to incorporate knowledge beyond the content of given image and question for answer prediction. Most existing works made efforts on using graph neural networks or Multi-modal Large Language Models to incorporate external knowledge for answer generation. Despite the promising results, they have limited interpretability and exhibit a deficiency in handling questions with unseen answers. In this paper, we propose a novel interpretable graph routing network (GRN) which explicitly conducts entity routing over a constructed scene knowledge graph step by step for KB-VQA. At each step, GRN keeps an entity score vector representing how likely of each entity to be activated as the answer, and a transition matrix representing the transition probability from one entity to another. To answer the given question, GRN will focus on certain keywords of the question at each step and correspondingly conduct entity routing by transiting the entity scores according to the transition matrix computed referring to the focused question keywords. In this way, it clearly provides the reasoning process of KB-VQA and can handle the questions with unseen answers without distinction. Experiments on the benchmark dataset KRVQA have demonstrated that GRN improves the performance of KB-VQA by a large margin, surpassing existing state-of-the art KB-VQA methods and Multi-modal Large Language Models, as well as shows competent capability in handling unseen answers and good interpretability in KB-VQA.

Original languageEnglish
Title of host publicationSIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages1984-1994
Number of pages11
ISBN (Electronic)9798400704314
DOIs
Publication statusPublished - 10 Jul 2024
Event47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024 - Washington, United States
Duration: 14 Jul 202418 Jul 2024

Publication series

NameSIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Country/TerritoryUnited States
CityWashington
Period14/07/2418/07/24

Keywords

  • graph routing network
  • knowledge-based visual question answering
  • scene knowledge graph

Fingerprint

Dive into the research topics of 'Let Me Show You Step by Step: An Interpretable Graph Routing Network for Knowledge-based Visual Question Answering'. Together they form a unique fingerprint.

Cite this