Cost-effective crowdsourced entity resolution: A partial-order approach

Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng

科研成果: 书/报告/会议事项章节会议稿件同行评审

84 引用 (Scopus)

摘要

Crowdsourced entity resolution has recently attracted significant attentions because it can harness the wisdom of crowd to improve the quality of entity resolution. However existing techniques either cannot achieve high quality or incur huge monetary costs. To address these problems, we propose a cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality. We first define a partial order on the pairs of records. Then we select a pair as a question and ask the crowd to check whether the records in the pair refer to the same entity. After getting the answer of this pair, we infer the answers of other pairs based on the partial order. Next we iteratively select pairs without answers to ask until we get the answers of all pairs. We devise effective algorithms to judiciously select the pairs to ask in order to minimize the number of asked pairs. To further reduce the cost, we propose a grouping technique to group the pairs and we only ask one pair instead of all pairs in each group. We develop error-tolerant techniques to tolerate the errors introduced by the partial order and the crowd. Experimental results show that our method reduces the cost to 1.25% of existing approaches (or existing approaches take 80° monetary cost of our method) while not sacrificing the quality.

源语言英语
主期刊名SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
出版商Association for Computing Machinery
969-984
页数16
ISBN(电子版)9781450335317
DOI
出版状态已出版 - 26 6月 2016
已对外发布
活动2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, 美国
期限: 26 6月 20161 7月 2016

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
26-June-2016
ISSN(印刷版)0730-8078

会议

会议2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
国家/地区美国
San Francisco
时期26/06/161/07/16

指纹

探究 'Cost-effective crowdsourced entity resolution: A partial-order approach' 的科研主题。它们共同构成独一无二的指纹。

引用此