Demystifying Artificial Intelligence for Data Preparation

Chengliang Chai, Nan Tang, Ju Fan, Yuyu Luo

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Data preparation - the process of discovering, integrating, transforming, cleaning, and annotating data - is one of the oldest, hardest, yet inevitable data management problems. Unfortunately, data preparation is known to be iterative, requires high human cost, and is error-prone. Recent advances in artificial intelligence (AI) have shown very promising results on many data preparation tasks. At a high level, AI for data preparation (AI4DP) should have the following abilities. First, the AI model should capture real-world knowledge so as to solve various tasks. Second, it is important to easily adapt to new datasets/tasks. Third, data preparation is a complicated pipeline with many operations, which results in a large number of candidates to select the optimum, and thus it is crucial to effectively and efficiently explore the large space of possible pipelines. In this tutorial, we will cover three important topics to address the above issues: demystifying foundation models to inject knowledge for data preparation, tuning and adapting pre-trained language models for data preparation, and orchestrating data preparation pipelines for different downstream applications.

源语言英语
主期刊名SIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data
出版商Association for Computing Machinery
13-20
页数8
ISBN(电子版)9781450395076
DOI
出版状态已出版 - 4 6月 2023
活动2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, 美国
期限: 18 6月 202323 6月 2023

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN(印刷版)0730-8078

会议

会议2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
国家/地区美国
Seattle
时期18/06/2323/06/23

指纹

探究 'Demystifying Artificial Intelligence for Data Preparation' 的科研主题。它们共同构成独一无二的指纹。

引用此