众包数据库综述

Cheng Liang Chai, Guo Liang Li*, Tian Yu Zhao, Yu Yu Luo, Ming He Yu

*此作品的通讯作者

科研成果: 期刊稿件文献综述同行评审

1 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 2
  • Captures
    • Readers: 6
see details

摘要

Nowadays, many data management tasks cannot purely rely on machine-based algorithms to be resolved. Therefore, crowdsourcing has attracted the interest of many researchers, which leverages the crowd ability to address the problems that are hard for the computer. Thanks to crowdsourcing platforms, e.g., Amazon Mechanical Turk, we can easily hire hundreds of thousands of workers to resolve these computer-hard tasks. The technical difficulty of crowdsourcing is the complexity of interactions among the above three components, which makes the requesters hard to use and manage their tasks. For example, it is inconvenient for the requester to interact with the crowdsourcing platforms, which require the requesters to set parameters and write codes to display the tasks. Inspired by traditional DBMS, crowdsourcing database systems have been proposed to encapsulate the complexities of interacting with the crowd. The challenges include how to easily use crowdsourcing platforms, how to design query optimization models to optimize crowdsourcing costs, quality and latency and how to support complex crowdsourcing operations. In this paper, we will survey a wide spectrum of existing studies on crowdsourcing database systems. We first give an overview of crowdsourcing, and then introduce the fundamental techniques in designing crowdsourcing databases, including truth inference, task assignment, cost control, etc. In this part, we focus on reviewing sophisticated techniques on improving quality, reducing cost and reducing latency. Next, we will illustrate several popular crowd-powered database systems, including Deco, Qurk, CrowdDB and CDB. We mainly discuss the query language, query optimization models and supporting operations in these databases. Moreover, we review techniques on designing different operators, including selection, join, sort, etc. In this part, we mainly focus on how to optimize the cost, quality and latency for these operators. Finally, we discuss the future works and challenges.

投稿的翻译标题Crowd-Powered Database System: A Survey
源语言繁体中文
页(从-至)948-972
页数25
期刊Jisuanji Xuebao/Chinese Journal of Computers
43
5
DOI
出版状态已出版 - 1 5月 2020
已对外发布

关键词

  • Cost optimization
  • Crowd-powered
  • Database
  • Quality control
  • Query optimization

指纹

探究 '众包数据库综述' 的科研主题。它们共同构成独一无二的指纹。

引用此

Chai, C. L., Li, G. L., Zhao, T. Y., Luo, Y. Y., & Yu, M. H. (2020). 众包数据库综述. Jisuanji Xuebao/Chinese Journal of Computers, 43(5), 948-972. https://doi.org/10.11897/SP.J.1016.2020.00948