LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes

Chengliang Chai, Yuhao Deng, Yutong Zhan, Ziqi Cao, Yuanfang Zhang, Lei Cao, Yuping Wang, Zhiwei Zhang, Ye Yuan*, Guoren Wang, Nan Tang

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

摘要

Searching tables from poorly maintained data lakes has long been recognized as a formidable challenge in the realm of data management. There are three pivotal tasks: keyword-based, joinable and unionable table search, which form the backbone of tasks that aim to make sense of diverse datasets, such as machine learning. In this demo, we propose LakeCompass, an end-to-end prototype system that maintains abundant tabular data, supports all above search tasks with high efficacy, and well serves downstream ML modeling. To be specific, LakeCompass manages numerous real tables over which diverse types of indexes are built to support efficient search based on different user requirements. Particularly, LakeCompass could automatically integrate these discovered tables to improve the downstream model performance in an iterative approach. Finally, we provide both Python APIs and Web interface to facilitate flexible user interaction.

源语言英语
页(从-至)4381-4384
页数4
期刊Proceedings of the VLDB Endowment
17
12
DOI
出版状态已出版 - 2024
活动50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, 中国
期限: 24 8月 202429 8月 2024

指纹

探究 'LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes' 的科研主题。它们共同构成独一无二的指纹。

引用此