Syntax Tree Constrained Graph Network for Visual Question Answering

Xiangrui Su, Qi Zhang, Chongyang Shi*, Jiachang Liu, Liang Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.

Original languageEnglish
Title of host publicationNeural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
EditorsBiao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages122-136
Number of pages15
ISBN (Print)9789819980727
DOIs
Publication statusPublished - 2024
Event30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, China
Duration: 20 Nov 202323 Nov 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14451 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference30th International Conference on Neural Information Processing, ICONIP 2023
Country/TerritoryChina
CityChangsha
Period20/11/2323/11/23

Keywords

  • Graph neural network
  • Message passing
  • Syntax tree
  • Tree convolution
  • Visual question answering

Fingerprint

Dive into the research topics of 'Syntax Tree Constrained Graph Network for Visual Question Answering'. Together they form a unique fingerprint.

Cite this