TY - GEN
T1 - Deep Hierarchical Attention Flow for Visual Commonsense Reasoning
AU - Song, Yuansheng
AU - Jian, Ping
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Visual Commonsense Reasoning (VCR) requires a thoroughly understanding general information connecting language and vision, as well as the background world knowledge. In this paper, we introduce a novel yet powerful deep hierarchical attention flow framework, which takes full advantage of text information in the query and candidate responses to perform reasoning over the image. Moreover, inspired by the success of machine reading comprehension, we also model the correlation among candidate responses to obtain better response representations. Extensive quantitative and qualitative experiments are conducted to evaluate the proposed model. Empirical results on the benchmark VCR1.0 show that the proposed model outperforms existing strong baselines, which demonstrates the effectiveness of our method.
AB - Visual Commonsense Reasoning (VCR) requires a thoroughly understanding general information connecting language and vision, as well as the background world knowledge. In this paper, we introduce a novel yet powerful deep hierarchical attention flow framework, which takes full advantage of text information in the query and candidate responses to perform reasoning over the image. Moreover, inspired by the success of machine reading comprehension, we also model the correlation among candidate responses to obtain better response representations. Extensive quantitative and qualitative experiments are conducted to evaluate the proposed model. Empirical results on the benchmark VCR1.0 show that the proposed model outperforms existing strong baselines, which demonstrates the effectiveness of our method.
KW - Hierarchical attention flow
KW - Visual commonsense reasoning
KW - Visual question answering
UR - http://www.scopus.com/inward/record.url?scp=85093089522&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-60450-9_2
DO - 10.1007/978-3-030-60450-9_2
M3 - Conference contribution
AN - SCOPUS:85093089522
SN - 9783030604493
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 28
BT - Natural Language Processing and Chinese Computing - 9th CCF International Conference, NLPCC 2020, Proceedings
A2 - Zhu, Xiaodan
A2 - Zhang, Min
A2 - Hong, Yu
A2 - He, Ruifang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2020
Y2 - 14 October 2020 through 18 October 2020
ER -