An Empirical Study on the Language Modal in Visual Question Answering

Daowan Peng; Wei Wei; Xian Ling Mao; Yuanyuan Fu; Dangyang Chen

An Empirical Study on the Language Modal in Visual Question Answering

Daowan Peng, Wei Wei^*, Xian Ling Mao, Yuanyuan Fu, Dangyang Chen

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.

Original language	English
Title of host publication	Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Editors	Edith Elkind
Publisher	International Joint Conferences on Artificial Intelligence
Pages	4109-4117
Number of pages	9
ISBN (Electronic)	9781956792034
Publication status	Published - 2023
Event	32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, China Duration: 19 Aug 2023 → 25 Aug 2023

Publication series

Name	IJCAI International Joint Conference on Artificial Intelligence
Volume	2023-August
ISSN (Print)	1045-0823

Conference

Conference	32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Country/Territory	China
City	Macao
Period	19/08/23 → 25/08/23

Cite this

Peng, D., Wei, W., Mao, X. L., Fu, Y., & Chen, D. (2023). An Empirical Study on the Language Modal in Visual Question Answering. In E. Elkind (Ed.), Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 (pp. 4109-4117). (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2023-August). International Joint Conferences on Artificial Intelligence.

Peng, Daowan ; Wei, Wei ; Mao, Xian Ling et al. / An Empirical Study on the Language Modal in Visual Question Answering. Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. editor / Edith Elkind. International Joint Conferences on Artificial Intelligence, 2023. pp. 4109-4117 (IJCAI International Joint Conference on Artificial Intelligence).

@inproceedings{efa989c0ac144f15b2b6da8a3892829d,

title = "An Empirical Study on the Language Modal in Visual Question Answering",

abstract = "Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.",

author = "Daowan Peng and Wei Wei and Mao, {Xian Ling} and Yuanyuan Fu and Dangyang Chen",

note = "Publisher Copyright: {\textcopyright} 2023 International Joint Conferences on Artificial Intelligence. All rights reserved.; 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 ; Conference date: 19-08-2023 Through 25-08-2023",

year = "2023",

language = "English",

series = "IJCAI International Joint Conference on Artificial Intelligence",

publisher = "International Joint Conferences on Artificial Intelligence",

pages = "4109--4117",

editor = "Edith Elkind",

booktitle = "Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023",

}

Peng, D, Wei, W, Mao, XL, Fu, Y & Chen, D 2023, An Empirical Study on the Language Modal in Visual Question Answering. in E Elkind (ed.), Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. IJCAI International Joint Conference on Artificial Intelligence, vol. 2023-August, International Joint Conferences on Artificial Intelligence, pp. 4109-4117, 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023, Macao, China, 19/08/23.

An Empirical Study on the Language Modal in Visual Question Answering. / Peng, Daowan; Wei, Wei; Mao, Xian Ling et al.
Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. ed. / Edith Elkind. International Joint Conferences on Artificial Intelligence, 2023. p. 4109-4117 (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2023-August).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - An Empirical Study on the Language Modal in Visual Question Answering

AU - Peng, Daowan

AU - Wei, Wei

AU - Mao, Xian Ling

AU - Fu, Yuanyuan

AU - Chen, Dangyang

PY - 2023

Y1 - 2023

N2 - Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.

AB - Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.

UR - http://www.scopus.com/inward/record.url?scp=85170381960&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85170381960

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 4109

EP - 4117

BT - Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023

A2 - Elkind, Edith

PB - International Joint Conferences on Artificial Intelligence

T2 - 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023

Y2 - 19 August 2023 through 25 August 2023

ER -

Peng D, Wei W, Mao XL, Fu Y, Chen D. An Empirical Study on the Language Modal in Visual Question Answering. In Elkind E, editor, Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. International Joint Conferences on Artificial Intelligence. 2023. p. 4109-4117. (IJCAI International Joint Conference on Artificial Intelligence).

An Empirical Study on the Language Modal in Visual Question Answering

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this