Overcoming language priors in VQA via decomposed linguistic representations

Chenchen Jing, Yuwei Wu*, Xiaoxun Zhang, Yunde Jia, Qi Wu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

77 引用 (Scopus)

摘要

Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We use the type representation to identify the question type and the possible answer set (yes/no or specific concepts such as colors or numbers), and the object representation to focus on the relevant region of an image. The concept representation is verified with the attended region to infer the final answer. The proposed method decouples the language-based concept discovery and vision-based concept verification in the process of answer inference to prevent language priors from dominating the answering process. Experiments on the VQA-CP dataset demonstrate the effectiveness of our method.

源语言英语
主期刊名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
出版商AAAI press
11181-11188
页数8
ISBN(电子版)9781577358350
出版状态已出版 - 2020
活动34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, 美国
期限: 7 2月 202012 2月 2020

出版系列

姓名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

会议

会议34th AAAI Conference on Artificial Intelligence, AAAI 2020
国家/地区美国
New York
时期7/02/2012/02/20

指纹

探究 'Overcoming language priors in VQA via decomposed linguistic representations' 的科研主题。它们共同构成独一无二的指纹。

引用此