A Self-supervised Strategy for the Robustness of VQA Models

Jingyu Su, Chuanhao Li, Chenchen Jing, Yuwei Wu*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In visual question answering (VQA), most existing models suffer from language biases which make models not robust. Recently, many approaches have been proposed to alleviate language biases by generating samples for the VQA task. These methods require the model to distinguish original samples from synthetic samples, to ensure that the model fully understands two modalities of both visual and linguistic information rather than just predicts answers based on language biases. However, these models are still not sensitive enough to changes of key information in questions. To make full use of the key information in questions, we design a self-supervised strategy to make the nouns of questions be focused for enhancing the robustness of VQA models. Its auxiliary training process, predicting answers for synthetic samples generated by masking the last noun in questions, alleviates the negative influence of language biases. Experiments conducted on VQA-CP v2 and VQA v2 datasets show that our method achieves better results than other VQA models.

源语言英语
主期刊名Intelligent Information Processing XI - 12th IFIP TC 12 International Conference, IIP 2022, Proceedings
编辑Zhongzhi Shi, Jean-Daniel Zucker, Bo An
出版商Springer Science and Business Media Deutschland GmbH
290-298
页数9
ISBN(印刷版)9783031039478
DOI
出版状态已出版 - 2022
活动12th IFIP TC 12 International Conference on Intelligent Information Processing, IIP 2022 - Qingdao, 中国
期限: 27 5月 202230 5月 2022

出版系列

姓名IFIP Advances in Information and Communication Technology
643 IFIP
ISSN(印刷版)1868-4238
ISSN(电子版)1868-422X

会议

会议12th IFIP TC 12 International Conference on Intelligent Information Processing, IIP 2022
国家/地区中国
Qingdao
时期27/05/2230/05/22

指纹

探究 'A Self-supervised Strategy for the Robustness of VQA Models' 的科研主题。它们共同构成独一无二的指纹。

引用此