Syntax Tree Constrained Graph Network for Visual Question Answering

Xiangrui Su, Qi Zhang, Chongyang Shi*, Jiachang Liu, Liang Hu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.

源语言英语
主期刊名Neural Information Processing - 30th International Conference, ICONIP 2023, Proceedings
编辑Biao Luo, Long Cheng, Zheng-Guang Wu, Hongyi Li, Chaojie Li
出版商Springer Science and Business Media Deutschland GmbH
122-136
页数15
ISBN(印刷版)9789819980727
DOI
出版状态已出版 - 2024
活动30th International Conference on Neural Information Processing, ICONIP 2023 - Changsha, 中国
期限: 20 11月 202323 11月 2023

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
14451 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议30th International Conference on Neural Information Processing, ICONIP 2023
国家/地区中国
Changsha
时期20/11/2323/11/23

指纹

探究 'Syntax Tree Constrained Graph Network for Visual Question Answering' 的科研主题。它们共同构成独一无二的指纹。

引用此