A Self-supervised Strategy for the Robustness of VQA Models

Jingyu Su, Chuanhao Li, Chenchen Jing, Yuwei Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In visual question answering (VQA), most existing models suffer from language biases which make models not robust. Recently, many approaches have been proposed to alleviate language biases by generating samples for the VQA task. These methods require the model to distinguish original samples from synthetic samples, to ensure that the model fully understands two modalities of both visual and linguistic information rather than just predicts answers based on language biases. However, these models are still not sensitive enough to changes of key information in questions. To make full use of the key information in questions, we design a self-supervised strategy to make the nouns of questions be focused for enhancing the robustness of VQA models. Its auxiliary training process, predicting answers for synthetic samples generated by masking the last noun in questions, alleviates the negative influence of language biases. Experiments conducted on VQA-CP v2 and VQA v2 datasets show that our method achieves better results than other VQA models.

Original languageEnglish
Title of host publicationIntelligent Information Processing XI - 12th IFIP TC 12 International Conference, IIP 2022, Proceedings
EditorsZhongzhi Shi, Jean-Daniel Zucker, Bo An
PublisherSpringer Science and Business Media Deutschland GmbH
Pages290-298
Number of pages9
ISBN (Print)9783031039478
DOIs
Publication statusPublished - 2022
Event12th IFIP TC 12 International Conference on Intelligent Information Processing, IIP 2022 - Qingdao, China
Duration: 27 May 202230 May 2022

Publication series

NameIFIP Advances in Information and Communication Technology
Volume643 IFIP
ISSN (Print)1868-4238
ISSN (Electronic)1868-422X

Conference

Conference12th IFIP TC 12 International Conference on Intelligent Information Processing, IIP 2022
Country/TerritoryChina
CityQingdao
Period27/05/2230/05/22

Keywords

  • Language bias
  • Self-supervised learning
  • Visual question answering

Fingerprint

Dive into the research topics of 'A Self-supervised Strategy for the Robustness of VQA Models'. Together they form a unique fingerprint.

Cite this