Deep Hierarchical Attention Flow for Visual Commonsense Reasoning

Yuansheng Song, Ping Jian*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Visual Commonsense Reasoning (VCR) requires a thoroughly understanding general information connecting language and vision, as well as the background world knowledge. In this paper, we introduce a novel yet powerful deep hierarchical attention flow framework, which takes full advantage of text information in the query and candidate responses to perform reasoning over the image. Moreover, inspired by the success of machine reading comprehension, we also model the correlation among candidate responses to obtain better response representations. Extensive quantitative and qualitative experiments are conducted to evaluate the proposed model. Empirical results on the benchmark VCR1.0 show that the proposed model outperforms existing strong baselines, which demonstrates the effectiveness of our method.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings
EditorsXiaodan Zhu, Min Zhang, Yu Hong, Ruifang He
PublisherSpringer Science and Business Media Deutschland GmbH
Pages16-28
Number of pages13
ISBN (Print)9783030604493
DOIs
Publication statusPublished - 2020
Event9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020 - Zhengzhou, China
Duration: 14 Oct 202018 Oct 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12430 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020
Country/TerritoryChina
CityZhengzhou
Period14/10/2018/10/20

Keywords

  • Hierarchical attention flow
  • Visual commonsense reasoning
  • Visual question answering

Fingerprint

Dive into the research topics of 'Deep Hierarchical Attention Flow for Visual Commonsense Reasoning'. Together they form a unique fingerprint.

Cite this