CDB: Optimizing queries with crowd-based selections and joins

Guoliang Li; Chengliang Chai; Ju Fan; Xueping Weng; Jian Li; Yudian Zheng; Yuanbing Li; Xiang Yu; Xiaohang Zhang; Haitao Yuan

doi:10.1145/3035918.3064036

CDB: Optimizing queries with crowd-based selections and joins

Guoliang Li, Chengliang Chai, Ju Fan, Xueping Weng, Jian Li, Yudian Zheng, Yuanbing Li, Xiang Yu, Xiaohang Zhang, Haitao Yuan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

68 Citations (Scopus)

Abstract

Crowdsourcing database systems have been proposed to leverage crowd-powered operations to encapsulate the complexities of interacting with the crowd. Existing systems suffer from two major limitations. Firstly, in order to optimize a query, they often adopt the traditional tree model to select an optimized table-level join order. However, the tree model provides a coarse-grained optimization, which generates the same order for different joined tuples and limits the optimization potential that different joined tuples can be optimized by different orders. Secondly, they mainly focus on optimizing the monetary cost. In fact, there are three optimization goals (i.e., smaller monetary cost, lower latency, and higher quality) in crowdsourcing, and it calls for a system to enable multi-goal optimization. To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on AMT, CrowdFlower and ChinaCrowd. We have also created a benchmark for evaluating crowd-powered databases. We have conducted both simulated and real experiments, and the experimental results demonstrate the performance superiority of CDB on cost, latency and quality.

Original language	English
Title of host publication	SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data
Publisher	Association for Computing Machinery
Pages	1463-1478
Number of pages	16
ISBN (Electronic)	9781450341974
DOIs	https://doi.org/10.1145/3035918.3064036
Publication status	Published - 9 May 2017
Externally published	Yes
Event	2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 - Chicago, United States Duration: 14 May 2017 → 19 May 2017

Publication series

Name	Proceedings of the ACM SIGMOD International Conference on Management of Data
Volume	Part F127746
ISSN (Print)	0730-8078

Conference

Conference	2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017
Country/Territory	United States
City	Chicago
Period	14/05/17 → 19/05/17

Keywords

Crowd-based join
Crowd-based selection
Crowdsourcing
Crowdsourcing optimization

Access to Document

10.1145/3035918.3064036

Cite this

Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X., & Yuan, H. (2017). CDB: Optimizing queries with crowd-based selections and joins. In SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1463-1478). (Proceedings of the ACM SIGMOD International Conference on Management of Data; Vol. Part F127746). Association for Computing Machinery. https://doi.org/10.1145/3035918.3064036

@inproceedings{0e5eb18f4fa34d5ab3cfa71eccd475d4,

title = "CDB: Optimizing queries with crowd-based selections and joins",

abstract = "Crowdsourcing database systems have been proposed to leverage crowd-powered operations to encapsulate the complexities of interacting with the crowd. Existing systems suffer from two major limitations. Firstly, in order to optimize a query, they often adopt the traditional tree model to select an optimized table-level join order. However, the tree model provides a coarse-grained optimization, which generates the same order for different joined tuples and limits the optimization potential that different joined tuples can be optimized by different orders. Secondly, they mainly focus on optimizing the monetary cost. In fact, there are three optimization goals (i.e., smaller monetary cost, lower latency, and higher quality) in crowdsourcing, and it calls for a system to enable multi-goal optimization. To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on AMT, CrowdFlower and ChinaCrowd. We have also created a benchmark for evaluating crowd-powered databases. We have conducted both simulated and real experiments, and the experimental results demonstrate the performance superiority of CDB on cost, latency and quality.",

keywords = "Crowd-based join, Crowd-based selection, Crowdsourcing, Crowdsourcing optimization",

author = "Guoliang Li and Chengliang Chai and Ju Fan and Xueping Weng and Jian Li and Yudian Zheng and Yuanbing Li and Xiang Yu and Xiaohang Zhang and Haitao Yuan",

note = "Publisher Copyright: {\textcopyright} 2017 ACM.; 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017 ; Conference date: 14-05-2017 Through 19-05-2017",

year = "2017",

month = may,

day = "9",

doi = "10.1145/3035918.3064036",

language = "English",

series = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

publisher = "Association for Computing Machinery",

pages = "1463--1478",

booktitle = "SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data",

}

Li, G, Chai, C, Fan, J, Weng, X, Li, J, Zheng, Y, Li, Y, Yu, X, Zhang, X & Yuan, H 2017, CDB: Optimizing queries with crowd-based selections and joins. in SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. Part F127746, Association for Computing Machinery, pp. 1463-1478, 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017, Chicago, United States, 14/05/17. https://doi.org/10.1145/3035918.3064036

CDB: Optimizing queries with crowd-based selections and joins. / Li, Guoliang; Chai, Chengliang; Fan, Ju et al.
SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Association for Computing Machinery, 2017. p. 1463-1478 (Proceedings of the ACM SIGMOD International Conference on Management of Data; Vol. Part F127746).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - CDB

T2 - 2017 ACM SIGMOD International Conference on Management of Data, SIGMOD 2017

AU - Li, Guoliang

AU - Chai, Chengliang

AU - Fan, Ju

AU - Weng, Xueping

AU - Li, Jian

AU - Zheng, Yudian

AU - Li, Yuanbing

AU - Yu, Xiang

AU - Zhang, Xiaohang

AU - Yuan, Haitao

PY - 2017/5/9

Y1 - 2017/5/9

N2 - Crowdsourcing database systems have been proposed to leverage crowd-powered operations to encapsulate the complexities of interacting with the crowd. Existing systems suffer from two major limitations. Firstly, in order to optimize a query, they often adopt the traditional tree model to select an optimized table-level join order. However, the tree model provides a coarse-grained optimization, which generates the same order for different joined tuples and limits the optimization potential that different joined tuples can be optimized by different orders. Secondly, they mainly focus on optimizing the monetary cost. In fact, there are three optimization goals (i.e., smaller monetary cost, lower latency, and higher quality) in crowdsourcing, and it calls for a system to enable multi-goal optimization. To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on AMT, CrowdFlower and ChinaCrowd. We have also created a benchmark for evaluating crowd-powered databases. We have conducted both simulated and real experiments, and the experimental results demonstrate the performance superiority of CDB on cost, latency and quality.

AB - Crowdsourcing database systems have been proposed to leverage crowd-powered operations to encapsulate the complexities of interacting with the crowd. Existing systems suffer from two major limitations. Firstly, in order to optimize a query, they often adopt the traditional tree model to select an optimized table-level join order. However, the tree model provides a coarse-grained optimization, which generates the same order for different joined tuples and limits the optimization potential that different joined tuples can be optimized by different orders. Secondly, they mainly focus on optimizing the monetary cost. In fact, there are three optimization goals (i.e., smaller monetary cost, lower latency, and higher quality) in crowdsourcing, and it calls for a system to enable multi-goal optimization. To address the limitations, we develop a crowd-powered database system CDB that supports crowd-based query optimizations, with focus on join and selection. CDB has fundamental differences from existing systems. First, CDB employs a graph-based query model that provides more fine-grained query optimization. Second, CDB adopts a unified framework to perform the multi-goal optimization based on the graph model. We have implemented our system and deployed it on AMT, CrowdFlower and ChinaCrowd. We have also created a benchmark for evaluating crowd-powered databases. We have conducted both simulated and real experiments, and the experimental results demonstrate the performance superiority of CDB on cost, latency and quality.

KW - Crowd-based join

KW - Crowd-based selection

KW - Crowdsourcing

KW - Crowdsourcing optimization

UR - http://www.scopus.com/inward/record.url?scp=85021192548&partnerID=8YFLogxK

U2 - 10.1145/3035918.3064036

DO - 10.1145/3035918.3064036

M3 - Conference contribution

AN - SCOPUS:85021192548

T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data

SP - 1463

EP - 1478

BT - SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data

PB - Association for Computing Machinery

Y2 - 14 May 2017 through 19 May 2017

ER -

Li G, Chai C, Fan J, Weng X, Li J, Zheng Y et al. CDB: Optimizing queries with crowd-based selections and joins. In SIGMOD 2017 - Proceedings of the 2017 ACM International Conference on Management of Data. Association for Computing Machinery. 2017. p. 1463-1478. (Proceedings of the ACM SIGMOD International Conference on Management of Data). doi: 10.1145/3035918.3064036

CDB: Optimizing queries with crowd-based selections and joins

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this