TY - JOUR
T1 - LakeCompass
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
AU - Chai, Chengliang
AU - Deng, Yuhao
AU - Zhan, Yutong
AU - Cao, Ziqi
AU - Zhang, Yuanfang
AU - Cao, Lei
AU - Wang, Yuping
AU - Zhang, Zhiwei
AU - Yuan, Ye
AU - Wang, Guoren
AU - Tang, Nan
N1 - Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Searching tables from poorly maintained data lakes has long been recognized as a formidable challenge in the realm of data management. There are three pivotal tasks: keyword-based, joinable and unionable table search, which form the backbone of tasks that aim to make sense of diverse datasets, such as machine learning. In this demo, we propose LakeCompass, an end-to-end prototype system that maintains abundant tabular data, supports all above search tasks with high efficacy, and well serves downstream ML modeling. To be specific, LakeCompass manages numerous real tables over which diverse types of indexes are built to support efficient search based on different user requirements. Particularly, LakeCompass could automatically integrate these discovered tables to improve the downstream model performance in an iterative approach. Finally, we provide both Python APIs and Web interface to facilitate flexible user interaction.
AB - Searching tables from poorly maintained data lakes has long been recognized as a formidable challenge in the realm of data management. There are three pivotal tasks: keyword-based, joinable and unionable table search, which form the backbone of tasks that aim to make sense of diverse datasets, such as machine learning. In this demo, we propose LakeCompass, an end-to-end prototype system that maintains abundant tabular data, supports all above search tasks with high efficacy, and well serves downstream ML modeling. To be specific, LakeCompass manages numerous real tables over which diverse types of indexes are built to support efficient search based on different user requirements. Particularly, LakeCompass could automatically integrate these discovered tables to improve the downstream model performance in an iterative approach. Finally, we provide both Python APIs and Web interface to facilitate flexible user interaction.
UR - http://www.scopus.com/inward/record.url?scp=85205297002&partnerID=8YFLogxK
U2 - 10.14778/3685800.3685880
DO - 10.14778/3685800.3685880
M3 - Conference article
AN - SCOPUS:85205297002
SN - 2150-8097
VL - 17
SP - 4381
EP - 4384
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 12
Y2 - 24 August 2024 through 29 August 2024
ER -