Deep Hierarchical Attention Flow for Visual Commonsense Reasoning

Yuansheng Song, Ping Jian*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Visual Commonsense Reasoning (VCR) requires a thoroughly understanding general information connecting language and vision, as well as the background world knowledge. In this paper, we introduce a novel yet powerful deep hierarchical attention flow framework, which takes full advantage of text information in the query and candidate responses to perform reasoning over the image. Moreover, inspired by the success of machine reading comprehension, we also model the correlation among candidate responses to obtain better response representations. Extensive quantitative and qualitative experiments are conducted to evaluate the proposed model. Empirical results on the benchmark VCR1.0 show that the proposed model outperforms existing strong baselines, which demonstrates the effectiveness of our method.

源语言英语
主期刊名Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings
编辑Xiaodan Zhu, Min Zhang, Yu Hong, Ruifang He
出版商Springer Science and Business Media Deutschland GmbH
16-28
页数13
ISBN(印刷版)9783030604493
DOI
出版状态已出版 - 2020
活动9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020 - Zhengzhou, 中国
期限: 14 10月 202018 10月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12430 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020
国家/地区中国
Zhengzhou
时期14/10/2018/10/20

指纹

探究 'Deep Hierarchical Attention Flow for Visual Commonsense Reasoning' 的科研主题。它们共同构成独一无二的指纹。

引用此