DADER: Hands-Off Entity Resolution with Domain Adaptation

Jianhong Tu; Xiaoyue Han; Ju Fan Fanj@Ruc.Edu.Cn; Nan Tang; Chengliang Chai; Guoliang Li; Xiaoyong Du

doi:10.14778/3554821.3554870

DADER: Hands-Off Entity Resolution with Domain Adaptation

Jianhong Tu, Xiaoyue Han, Ju Fan Fanj@Ruc.Edu.Cn^*, Nan Tang, Chengliang Chai, Guoliang Li, Xiaoyong Du

^*此作品的通讯作者

科研成果: 期刊稿件 › 会议文章 › 同行评审

8 引用（Scopus）

摘要

Entity resolution (ER) is a core data integration problem that identifies pairs of data instances referring to the same real-world entities, and the state-of-the-art results of ER are achieved by deep learning (DL) based approaches. However, DL-based approaches typically require a large amount of labeled training data (i.e., matching and non-matching pairs), which incurs substantial manual labeling efforts. In this paper, we introduce DADER, a hands-off deep ER system through domain adaptation. DADER utilizes multiple well-labeled source ER datasets to train a DL-based ER model for a new target ER dataset that does not have any labels or with only a few labels. To address the key challenge of domain shift, DADER judiciously selects labeled entity pairs from the source and then aligns distributions of the source and the target by using six popular domain adaptation strategies. DADER can also harness the users to gather a few labels for further improvement. We have built DADER as an open-sourced Python Library with intuitive APIs and demonstrated its utility on supporting hands-off ER in real-world scenarios.

源语言	英语
页（从-至）	3666-3669
页数	4
期刊	Proceedings of the VLDB Endowment
卷	15
期	12
DOI	https://doi.org/10.14778/3554821.3554870
出版状态	已出版 - 2022
已对外发布	是
活动	48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, 澳大利亚期限: 5 9月 2022 → 9 9月 2022

访问文件

10.14778/3554821.3554870

其它文件与链接

链接到 Scopus 的出版物

引用此

Tu, J., Han, X., Fan Fanj@Ruc.Edu.Cn, J., Tang, N., Chai, C., Li, G., & Du, X. (2022). DADER: Hands-Off Entity Resolution with Domain Adaptation. Proceedings of the VLDB Endowment, 15(12), 3666-3669. https://doi.org/10.14778/3554821.3554870

@article{764eb2e91a2043d293624f809b79b9e2,

title = "DADER: Hands-Off Entity Resolution with Domain Adaptation",

abstract = "Entity resolution (ER) is a core data integration problem that identifies pairs of data instances referring to the same real-world entities, and the state-of-the-art results of ER are achieved by deep learning (DL) based approaches. However, DL-based approaches typically require a large amount of labeled training data (i.e., matching and non-matching pairs), which incurs substantial manual labeling efforts. In this paper, we introduce DADER, a hands-off deep ER system through domain adaptation. DADER utilizes multiple well-labeled source ER datasets to train a DL-based ER model for a new target ER dataset that does not have any labels or with only a few labels. To address the key challenge of domain shift, DADER judiciously selects labeled entity pairs from the source and then aligns distributions of the source and the target by using six popular domain adaptation strategies. DADER can also harness the users to gather a few labels for further improvement. We have built DADER as an open-sourced Python Library with intuitive APIs and demonstrated its utility on supporting hands-off ER in real-world scenarios.",

author = "Jianhong Tu and Xiaoyue Han and {Fan Fanj@Ruc.Edu.Cn}, Ju and Nan Tang and Chengliang Chai and Guoliang Li and Xiaoyong Du",

note = "Publisher Copyright: {\textcopyright} 2022, VLDB Endowment. All rights reserved.; 48th International Conference on Very Large Data Bases, VLDB 2022 ; Conference date: 05-09-2022 Through 09-09-2022",

year = "2022",

doi = "10.14778/3554821.3554870",

language = "English",

volume = "15",

pages = "3666--3669",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "Very Large Data Base Endowment Inc.",

number = "12",

}

TY - JOUR

T1 - DADER

T2 - 48th International Conference on Very Large Data Bases, VLDB 2022

AU - Tu, Jianhong

AU - Han, Xiaoyue

AU - Fan Fanj@Ruc.Edu.Cn, Ju

AU - Tang, Nan

AU - Chai, Chengliang

AU - Li, Guoliang

AU - Du, Xiaoyong

PY - 2022

Y1 - 2022

N2 - Entity resolution (ER) is a core data integration problem that identifies pairs of data instances referring to the same real-world entities, and the state-of-the-art results of ER are achieved by deep learning (DL) based approaches. However, DL-based approaches typically require a large amount of labeled training data (i.e., matching and non-matching pairs), which incurs substantial manual labeling efforts. In this paper, we introduce DADER, a hands-off deep ER system through domain adaptation. DADER utilizes multiple well-labeled source ER datasets to train a DL-based ER model for a new target ER dataset that does not have any labels or with only a few labels. To address the key challenge of domain shift, DADER judiciously selects labeled entity pairs from the source and then aligns distributions of the source and the target by using six popular domain adaptation strategies. DADER can also harness the users to gather a few labels for further improvement. We have built DADER as an open-sourced Python Library with intuitive APIs and demonstrated its utility on supporting hands-off ER in real-world scenarios.

AB - Entity resolution (ER) is a core data integration problem that identifies pairs of data instances referring to the same real-world entities, and the state-of-the-art results of ER are achieved by deep learning (DL) based approaches. However, DL-based approaches typically require a large amount of labeled training data (i.e., matching and non-matching pairs), which incurs substantial manual labeling efforts. In this paper, we introduce DADER, a hands-off deep ER system through domain adaptation. DADER utilizes multiple well-labeled source ER datasets to train a DL-based ER model for a new target ER dataset that does not have any labels or with only a few labels. To address the key challenge of domain shift, DADER judiciously selects labeled entity pairs from the source and then aligns distributions of the source and the target by using six popular domain adaptation strategies. DADER can also harness the users to gather a few labels for further improvement. We have built DADER as an open-sourced Python Library with intuitive APIs and demonstrated its utility on supporting hands-off ER in real-world scenarios.

UR - http://www.scopus.com/inward/record.url?scp=85137997395&partnerID=8YFLogxK

U2 - 10.14778/3554821.3554870

DO - 10.14778/3554821.3554870

M3 - Conference article

AN - SCOPUS:85137997395

SN - 2150-8097

VL - 15

SP - 3666

EP - 3669

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 12

Y2 - 5 September 2022 through 9 September 2022

ER -

DADER: Hands-Off Entity Resolution with Domain Adaptation

摘要

访问文件

其它文件与链接

指纹

引用此