TY - GEN
T1 - Across Images and Graphs for Question Answering
AU - Wen, Zhenyu
AU - Qian, Jiaxu
AU - Qian, Bin
AU - Yuan, Qin
AU - Qin, Jianbin
AU - Xuan, Qi
AU - Yuan, Ye
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Cross-source query serves as a proxy for scene understanding to support many web applications such as rec-ommendation systems, e-commerce, and e-learning applications. In this paper, we propose SVQA that semantically combines the knowledge from available images and graphs to answer the complex question. To this end, we design a graph-based method to unify various data sources into one representation. We then develop a complex question parse method that utilizes the structure of languages to transform the query into a query graph. A graph query engine that performs the query graph over the unified data source while optimizing the query process. To evaluate the proposed system, we build a vanilla dataset called MVQA and show that the state-of-the-art (SOTA) VQA models fail to perform our task. The comprehensive evaluations show that the proposed SVQA is able to reason implicit relationships over multiple images and external knowledge to correctly answer a complex query. We hope that our first attempt provides researchers with a fresh taste of multimodal data analysis.
AB - Cross-source query serves as a proxy for scene understanding to support many web applications such as rec-ommendation systems, e-commerce, and e-learning applications. In this paper, we propose SVQA that semantically combines the knowledge from available images and graphs to answer the complex question. To this end, we design a graph-based method to unify various data sources into one representation. We then develop a complex question parse method that utilizes the structure of languages to transform the query into a query graph. A graph query engine that performs the query graph over the unified data source while optimizing the query process. To evaluate the proposed system, we build a vanilla dataset called MVQA and show that the state-of-the-art (SOTA) VQA models fail to perform our task. The comprehensive evaluations show that the proposed SVQA is able to reason implicit relationships over multiple images and external knowledge to correctly answer a complex query. We hope that our first attempt provides researchers with a fresh taste of multimodal data analysis.
KW - Data Mining and Knowledge Discovery
KW - Query Processing. Indexing and Optimization
KW - Text. Semi-Structured Data. IR. Image. and Multimedia databases
UR - https://www.scopus.com/pages/publications/85200447291
U2 - 10.1109/ICDE60146.2024.00112
DO - 10.1109/ICDE60146.2024.00112
M3 - Conference contribution
AN - SCOPUS:85200447291
T3 - Proceedings - International Conference on Data Engineering
SP - 1366
EP - 1379
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
Y2 - 13 May 2024 through 17 May 2024
ER -