An analysis of graph convolutional networks and recent datasets for visual question answering

Abdulganiyu Abdu Yusuf*, Feng Chong, Mao Xianling

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

28 引用 (Scopus)

摘要

Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.

源语言英语
页(从-至)6277-6300
页数24
期刊Artificial Intelligence Review
55
8
DOI
出版状态已出版 - 12月 2022

指纹

探究 'An analysis of graph convolutional networks and recent datasets for visual question answering' 的科研主题。它们共同构成独一无二的指纹。

引用此