Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets

Abdulganiyu Abdu Yusuf; Feng Chong; Mao Xianling

doi:10.1007/s11042-022-13065-x

Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets

Abdulganiyu Abdu Yusuf, Feng Chong^*, Mao Xianling

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

In the recent era, graph neural networks are widely used on vision-to-language tasks and achieved promising results. In particular, graph convolution network (GCN) is capable of capturing spatial and semantic relationships needed for visual question answering (VQA). But, applying GCN on VQA datasets with different subtasks can lead to varying results. Also, the training and testing size, evaluation metrics and hyperparameter used are other factors that affect VQA results. These, factors can be subjected into similar evaluation schemes in order to obtain fair evaluations of GCN based result for VQA. This study proposed a GCN framework for VQA based on fine tune word representation to solve handle reasoning type questions. The framework performance is evaluated using various performance measures. The results obtained from GQA and VQA 2.0 datasets slightly outperform most existing methods.

Original language	English
Pages (from-to)	40361-40370
Number of pages	10
Journal	Multimedia Tools and Applications
Volume	81
Issue number	28
DOIs	https://doi.org/10.1007/s11042-022-13065-x
Publication status	Published - Nov 2022

Keywords

Fine-tuned representation
GCN
Performance measure
Reasoning datasets
VQA

Access to Document

10.1007/s11042-022-13065-x

Cite this

@article{c4efe0074dfb434eb1b9834ee19aae66,

title = "Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets",

abstract = "In the recent era, graph neural networks are widely used on vision-to-language tasks and achieved promising results. In particular, graph convolution network (GCN) is capable of capturing spatial and semantic relationships needed for visual question answering (VQA). But, applying GCN on VQA datasets with different subtasks can lead to varying results. Also, the training and testing size, evaluation metrics and hyperparameter used are other factors that affect VQA results. These, factors can be subjected into similar evaluation schemes in order to obtain fair evaluations of GCN based result for VQA. This study proposed a GCN framework for VQA based on fine tune word representation to solve handle reasoning type questions. The framework performance is evaluated using various performance measures. The results obtained from GQA and VQA 2.0 datasets slightly outperform most existing methods.",

keywords = "Fine-tuned representation, GCN, Performance measure, Reasoning datasets, VQA",

author = "Yusuf, {Abdulganiyu Abdu} and Feng Chong and Mao Xianling",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2022",

month = nov,

doi = "10.1007/s11042-022-13065-x",

language = "English",

volume = "81",

pages = "40361--40370",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer Netherlands",

number = "28",

}

TY - JOUR

T1 - Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets

AU - Yusuf, Abdulganiyu Abdu

AU - Chong, Feng

AU - Xianling, Mao

PY - 2022/11

Y1 - 2022/11

N2 - In the recent era, graph neural networks are widely used on vision-to-language tasks and achieved promising results. In particular, graph convolution network (GCN) is capable of capturing spatial and semantic relationships needed for visual question answering (VQA). But, applying GCN on VQA datasets with different subtasks can lead to varying results. Also, the training and testing size, evaluation metrics and hyperparameter used are other factors that affect VQA results. These, factors can be subjected into similar evaluation schemes in order to obtain fair evaluations of GCN based result for VQA. This study proposed a GCN framework for VQA based on fine tune word representation to solve handle reasoning type questions. The framework performance is evaluated using various performance measures. The results obtained from GQA and VQA 2.0 datasets slightly outperform most existing methods.

AB - In the recent era, graph neural networks are widely used on vision-to-language tasks and achieved promising results. In particular, graph convolution network (GCN) is capable of capturing spatial and semantic relationships needed for visual question answering (VQA). But, applying GCN on VQA datasets with different subtasks can lead to varying results. Also, the training and testing size, evaluation metrics and hyperparameter used are other factors that affect VQA results. These, factors can be subjected into similar evaluation schemes in order to obtain fair evaluations of GCN based result for VQA. This study proposed a GCN framework for VQA based on fine tune word representation to solve handle reasoning type questions. The framework performance is evaluated using various performance measures. The results obtained from GQA and VQA 2.0 datasets slightly outperform most existing methods.

KW - Fine-tuned representation

KW - GCN

KW - Performance measure

KW - Reasoning datasets

KW - VQA

UR - http://www.scopus.com/inward/record.url?scp=85129480969&partnerID=8YFLogxK

U2 - 10.1007/s11042-022-13065-x

DO - 10.1007/s11042-022-13065-x

M3 - Article

AN - SCOPUS:85129480969

SN - 1380-7501

VL - 81

SP - 40361

EP - 40370

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 28

ER -

Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this