An analysis of graph convolutional networks and recent datasets for visual question answering

Abdulganiyu Abdu Yusuf; Feng Chong; Mao Xianling

doi:10.1007/s10462-022-10151-2

An analysis of graph convolutional networks and recent datasets for visual question answering

Abdulganiyu Abdu Yusuf^*, Feng Chong, Mao Xianling

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

34 引用（Scopus）

摘要

Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.

源语言	英语
页（从-至）	6277-6300
页数	24
期刊	Artificial Intelligence Review
卷	55
期	8
DOI	https://doi.org/10.1007/s10462-022-10151-2
出版状态	已出版 - 12月 2022

访问文件

10.1007/s10462-022-10151-2

其它文件与链接

链接到 Scopus 的出版物

引用此

Yusuf, A. A., Chong, F., & Xianling, M. (2022). An analysis of graph convolutional networks and recent datasets for visual question answering. Artificial Intelligence Review, 55(8), 6277-6300. https://doi.org/10.1007/s10462-022-10151-2

@article{b3eb4db6f44e4ee091fb5b2393590cfc,

title = "An analysis of graph convolutional networks and recent datasets for visual question answering",

abstract = "Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.",

keywords = "Computer vision, Datasets, GCN, NLP, VQA",

author = "Yusuf, {Abdulganiyu Abdu} and Feng Chong and Mao Xianling",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Nature B.V.",

year = "2022",

month = dec,

doi = "10.1007/s10462-022-10151-2",

language = "English",

volume = "55",

pages = "6277--6300",

journal = "Artificial Intelligence Review",

issn = "0269-2821",

publisher = "Springer Netherlands",

number = "8",

}

TY - JOUR

T1 - An analysis of graph convolutional networks and recent datasets for visual question answering

AU - Yusuf, Abdulganiyu Abdu

AU - Chong, Feng

AU - Xianling, Mao

PY - 2022/12

Y1 - 2022/12

N2 - Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.

AB - Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.

KW - Computer vision

KW - Datasets

KW - GCN

KW - NLP

KW - VQA

UR - http://www.scopus.com/inward/record.url?scp=85127682728&partnerID=8YFLogxK

U2 - 10.1007/s10462-022-10151-2

DO - 10.1007/s10462-022-10151-2

M3 - Article

AN - SCOPUS:85127682728

SN - 0269-2821

VL - 55

SP - 6277

EP - 6300

JO - Artificial Intelligence Review

JF - Artificial Intelligence Review

IS - 8

ER -

An analysis of graph convolutional networks and recent datasets for visual question answering

摘要

访问文件

其它文件与链接

指纹

引用此