众包数据库综述

Cheng Liang Chai; Guo Liang Li; Tian Yu Zhao; Yu Yu Luo; Ming He Yu

doi:10.11897/SP.J.1016.2020.00948

众包数据库综述

Cheng Liang Chai, Guo Liang Li^*, Tian Yu Zhao, Yu Yu Luo, Ming He Yu

^*此作品的通讯作者

科研成果: 期刊稿件 › 文献综述 › 同行评审

1 引用（Scopus）

摘要

Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

投稿的翻译标题	Crowd-Powered Database System: A Survey
源语言	繁体中文
页（从-至）	948-972
页数	25
期刊	Jisuanji Xuebao/Chinese Journal of Computers
卷	43
期	5
DOI	https://doi.org/10.11897/SP.J.1016.2020.00948
出版状态	已出版 - 1 5月 2020
已对外发布	是

关键词

Cost optimization
Crowd-powered
Database
Quality control
Query optimization

访问文件

10.11897/SP.J.1016.2020.00948

其它文件与链接

链接到 Scopus 的出版物

引用此

Chai, C. L., Li, G. L., Zhao, T. Y., Luo, Y. Y., & Yu, M. H. (2020). 众包数据库综述. Jisuanji Xuebao/Chinese Journal of Computers, 43(5), 948-972. https://doi.org/10.11897/SP.J.1016.2020.00948

@article{7cf411112b4f4ec1995e26108226c910,

title = "众包数据库综述",

abstract = "Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.",

keywords = "Cost optimization, Crowd-powered, Database, Quality control, Query optimization",

author = "Chai, {Cheng Liang} and Li, {Guo Liang} and Zhao, {Tian Yu} and Luo, {Yu Yu} and Yu, {Ming He}",

year = "2020",

month = may,

day = "1",

doi = "10.11897/SP.J.1016.2020.00948",

language = "繁体中文",

volume = "43",

pages = "948--972",

journal = "Jisuanji Xuebao/Chinese Journal of Computers",

issn = "0254-4164",

publisher = "Science Press",

number = "5",

}

TY - JOUR

T1 - 众包数据库综述

AU - Chai, Cheng Liang

AU - Li, Guo Liang

AU - Zhao, Tian Yu

AU - Luo, Yu Yu

AU - Yu, Ming He

PY - 2020/5/1

Y1 - 2020/5/1

N2 - Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

AB - Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

KW - Cost optimization

KW - Crowd-powered

KW - Database

KW - Quality control

KW - Query optimization

UR - http://www.scopus.com/inward/record.url?scp=85089895116&partnerID=8YFLogxK

U2 - 10.11897/SP.J.1016.2020.00948

DO - 10.11897/SP.J.1016.2020.00948

M3 - 文献综述

AN - SCOPUS:85089895116

SN - 0254-4164

VL - 43

SP - 948

EP - 972

JO - Jisuanji Xuebao/Chinese Journal of Computers

JF - Jisuanji Xuebao/Chinese Journal of Computers

IS - 5

ER -

众包数据库综述

摘要

关键词

访问文件

其它文件与链接

指纹

引用此