Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

Meihao Fan, Xiaoyue Han, Ju Fan*, Chengliang Chai*, Nan Tang, Guoliang Li, Xiaoyong Du

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Entity resolution (ER) is an important data integration task with a wide spectrum of applications. The state-of-the-art solutions on ER rely on pre-trained language models (PLMs), which require fine-tuning on a lot of labeled matching/non-matching entity pairs. Recently, large languages models (LLMs), such as GPT-4, have shown the ability to perform many tasks without tuning model parameters, which is known as in-context learning (ICL) that facilitates effective learning from a few labeled input context demonstrations. However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs. To address the problem, in this paper, we provide a comprehensive study to investigate how to develop a cost-effective batch prompting approach to ER. We introduce a framework BATCHER consisting of demonstration selection and question batching and explore different design choices that support batch prompting for ER. We also devise a covering-based demonstration selection strategy that achieves an effective balance between matching accuracy and monetary cost. We conduct a thorough evaluation to explore the design space and evaluate our proposed strategies. Through extensive experiments, we find that batch prompting is very cost-effective for ER, compared with not only PLM-based methods fine-tuned with extensive labeled data but also LLM-based methods with manually designed prompting. We also provide guidance for selecting appropriate design choices for batch prompting.

源语言英语
主期刊名Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
出版商IEEE Computer Society
3696-3709
页数14
ISBN(电子版)9798350317152
DOI
出版状态已出版 - 2024
活动40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, 荷兰
期限: 13 5月 202417 5月 2024

出版系列

姓名Proceedings - International Conference on Data Engineering
ISSN(印刷版)1084-4627
ISSN(电子版)2375-0286

会议

会议40th IEEE International Conference on Data Engineering, ICDE 2024
国家/地区荷兰
Utrecht
时期13/05/2417/05/24

指纹

探究 'Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration' 的科研主题。它们共同构成独一无二的指纹。

引用此