Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets

Abdulganiyu Abdu Yusuf, Feng Chong*, Mao Xianling

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

In the recent era, graph neural networks are widely used on vision-to-language tasks and achieved promising results. In particular, graph convolution network (GCN) is capable of capturing spatial and semantic relationships needed for visual question answering (VQA). But, applying GCN on VQA datasets with different subtasks can lead to varying results. Also, the training and testing size, evaluation metrics and hyperparameter used are other factors that affect VQA results. These, factors can be subjected into similar evaluation schemes in order to obtain fair evaluations of GCN based result for VQA. This study proposed a GCN framework for VQA based on fine tune word representation to solve handle reasoning type questions. The framework performance is evaluated using various performance measures. The results obtained from GQA and VQA 2.0 datasets slightly outperform most existing methods.

Original languageEnglish
Pages (from-to)40361-40370
Number of pages10
JournalMultimedia Tools and Applications
Volume81
Issue number28
DOIs
Publication statusPublished - Nov 2022

Keywords

  • Fine-tuned representation
  • GCN
  • Performance measure
  • Reasoning datasets
  • VQA

Fingerprint

Dive into the research topics of 'Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets'. Together they form a unique fingerprint.

Cite this