众包数据库综述

Cheng Liang Chai; Guo Liang Li; Tian Yu Zhao; Yu Yu Luo; Ming He Yu

doi:10.11897/SP.J.1016.2020.00948

众包数据库综述

Translated title of the contribution: Crowd-Powered Database System: A Survey

Cheng Liang Chai, Guo Liang Li^*, Tian Yu Zhao, Yu Yu Luo, Ming He Yu

^*Corresponding author for this work

Research output: Contribution to journal › Review article › peer-review

1 Citation (Scopus)

Abstract

Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

Translated title of the contribution	Crowd-Powered Database System: A Survey
Original language	Chinese (Traditional)
Pages (from-to)	948-972
Number of pages	25
Journal	Jisuanji Xuebao/Chinese Journal of Computers
Volume	43
Issue number	5
DOIs	https://doi.org/10.11897/SP.J.1016.2020.00948
Publication status	Published - 1 May 2020
Externally published	Yes

Access to Document

10.11897/SP.J.1016.2020.00948

Cite this

Chai, C. L., Li, G. L., Zhao, T. Y., Luo, Y. Y., & Yu, M. H. (2020). 众包数据库综述. Jisuanji Xuebao/Chinese Journal of Computers, 43(5), 948-972. https://doi.org/10.11897/SP.J.1016.2020.00948

@article{7cf411112b4f4ec1995e26108226c910,

title = "众包数据库综述",

abstract = "Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.",

keywords = "Cost optimization, Crowd-powered, Database, Quality control, Query optimization",

author = "Chai, {Cheng Liang} and Li, {Guo Liang} and Zhao, {Tian Yu} and Luo, {Yu Yu} and Yu, {Ming He}",

year = "2020",

month = may,

day = "1",

doi = "10.11897/SP.J.1016.2020.00948",

language = "繁体中文",

volume = "43",

pages = "948--972",

journal = "Jisuanji Xuebao/Chinese Journal of Computers",

issn = "0254-4164",

publisher = "Science Press",

number = "5",

}

TY - JOUR

T1 - 众包数据库综述

AU - Chai, Cheng Liang

AU - Li, Guo Liang

AU - Zhao, Tian Yu

AU - Luo, Yu Yu

AU - Yu, Ming He

PY - 2020/5/1

Y1 - 2020/5/1

N2 - Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

AB - Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

KW - Cost optimization

KW - Crowd-powered

KW - Database

KW - Quality control

KW - Query optimization

UR - http://www.scopus.com/inward/record.url?scp=85089895116&partnerID=8YFLogxK

U2 - 10.11897/SP.J.1016.2020.00948

DO - 10.11897/SP.J.1016.2020.00948

M3 - 文献综述

AN - SCOPUS:85089895116

SN - 0254-4164

VL - 43

SP - 948

EP - 972

JO - Jisuanji Xuebao/Chinese Journal of Computers

JF - Jisuanji Xuebao/Chinese Journal of Computers

IS - 5

ER -

众包数据库综述

Abstract

Access to Document

Other files and links

Fingerprint

Cite this