An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake

Qin Yuan, Ye Yuan*, Zhenyu Wen, He Wang, Shiyuan Tang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.

Original languageEnglish
Title of host publicationSIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages770-780
Number of pages11
ISBN (Electronic)9781450394086
DOIs
Publication statusPublished - 19 Jul 2023
Event46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023 - Taipei, Taiwan, Province of China
Duration: 23 Jul 202327 Jul 2023

Publication series

NameSIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period23/07/2327/07/23

Keywords

  • heterogeneous data lake
  • query answering
  • relational schema

Fingerprint

Dive into the research topics of 'An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake'. Together they form a unique fingerprint.

Cite this