Domain Adaptation for Deep Entity Resolution

Jianhong Tu, Ju Fan*, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, Xiaoyong Du

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

27 引用 (Scopus)

摘要

Entity resolution (ER) is a core problem of data integration. The state-of-the-art (SOTA) results on ER are achieved by deep learning (DL) based methods, trained with a lot of labeled matching/non-matching entity pairs. This may not be a problem when using well-prepared benchmark datasets. Nevertheless, for many real-world ER applications, the situation changes dramatically, with a painful issue to collect large-scale labeled datasets. In this paper, we seek to answer: If we have a well-labeled source ER dataset, can we train a DL-based ER model for a target dataset, without any labels or with a few labels? This is known as domain adaptation (DA), which has achieved great successes in computer vision and natural language processing, but is not systematically studied for ER. Our goal is to systematically explore the benefits and limitations of a wide range of DA methods for ER. To this purpose, we develop a DADER (Domain Adaptation for Deep Entity Resolution) framework that significantly advances ER in applying DA. We define a space of design solutions for the three modules of DADER, namely Feature Extractor, Matcher, and Feature Aligner. We conduct so far the most comprehensive experimental study to explore the design space and compare different choices of DA for ER. We provide guidance for selecting appropriate design solutions based on extensive experiments.

源语言英语
主期刊名SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
出版商Association for Computing Machinery
443-457
页数15
ISBN(电子版)9781450392495
DOI
出版状态已出版 - 10 6月 2022
已对外发布
活动2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022 - Virtual, Online, 美国
期限: 12 6月 202217 6月 2022

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN(印刷版)0730-8078

会议

会议2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
国家/地区美国
Virtual, Online
时期12/06/2217/06/22

指纹

探究 'Domain Adaptation for Deep Entity Resolution' 的科研主题。它们共同构成独一无二的指纹。

引用此