Joint Learning of Object Graph and Relation Graph for Visual Question Answering

Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Citations (Scopus)

Abstract

Modeling visual question answering (VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which causes false attribute selection or missing relation in Figure 1(a). It is because these models cannot balance all kinds of information in scene graphs, neglecting relation and attribute information. In this paper, we introduce a novel Dual Message-passing enhanced Graph Neural Net-work (DM-GNN), which can obtain a balanced represen-tation by properly encoding multi-scale scene graph infor-mation. Specifically, we (i) transform the scene graph into two graphs with diversified focuses on objects and relations; Then we design a dual structure to encode them, which in-creases the weights from relations (ii) fuse the encoder out-put with attribute features, which increases the weights from attributes; (iii) propose a message-passing mechanism to en-hance the information transfer between objects, relations and attributes. We conduct extensive experiments on datasets in-cluding GQA, VG, motif-VG and achieve new state of the art.

Original languageEnglish
Title of host publicationICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9781665485630
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event2022 IEEE International Conference on Multimedia and Expo, ICME 2022 - Taipei, Taiwan, Province of China
Duration: 18 Jul 202222 Jul 2022

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2022-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Country/TerritoryTaiwan, Province of China
CityTaipei
Period18/07/2222/07/22

Keywords

  • Graph Neural Network
  • Scene Graph
  • Visual Question Answer

Fingerprint

Dive into the research topics of 'Joint Learning of Object Graph and Relation Graph for Visual Question Answering'. Together they form a unique fingerprint.

Cite this