DADER: Hands-Off Entity Resolution with Domain Adaptation

Jianhong Tu, Xiaoyue Han, Ju Fan Fanj@Ruc.Edu.Cn*, Nan Tang, Chengliang Chai, Guoliang Li, Xiaoyong Du

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

8 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 7
  • Captures
    • Readers: 5
see details

摘要

Entity resolution (ER) is a core data integration problem that identifies pairs of data instances referring to the same real-world entities, and the state-of-the-art results of ER are achieved by deep learning (DL) based approaches. However, DL-based approaches typically require a large amount of labeled training data (i.e., matching and non-matching pairs), which incurs substantial manual labeling efforts. In this paper, we introduce DADER, a hands-off deep ER system through domain adaptation. DADER utilizes multiple well-labeled source ER datasets to train a DL-based ER model for a new target ER dataset that does not have any labels or with only a few labels. To address the key challenge of domain shift, DADER judiciously selects labeled entity pairs from the source and then aligns distributions of the source and the target by using six popular domain adaptation strategies. DADER can also harness the users to gather a few labels for further improvement. We have built DADER as an open-sourced Python Library with intuitive APIs and demonstrated its utility on supporting hands-off ER in real-world scenarios.

源语言英语
页(从-至)3666-3669
页数4
期刊Proceedings of the VLDB Endowment
15
12
DOI
出版状态已出版 - 2022
已对外发布
活动48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, 澳大利亚
期限: 5 9月 20229 9月 2022

指纹

探究 'DADER: Hands-Off Entity Resolution with Domain Adaptation' 的科研主题。它们共同构成独一无二的指纹。

引用此

Tu, J., Han, X., Fan Fanj@Ruc.Edu.Cn, J., Tang, N., Chai, C., Li, G., & Du, X. (2022). DADER: Hands-Off Entity Resolution with Domain Adaptation. Proceedings of the VLDB Endowment, 15(12), 3666-3669. https://doi.org/10.14778/3554821.3554870