TY - JOUR
T1 - CrowdChart
T2 - Crowdsourced Data Extraction from Visualization Charts
AU - Chai, Chengliang
AU - Li, Guoliang
AU - Fan, Ju
AU - Luo, Yuyu
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2021/11/1
Y1 - 2021/11/1
N2 - Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.
AB - Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.
KW - Data visualization
KW - crowdsourcing
KW - task assignment
KW - truth inference
UR - http://www.scopus.com/inward/record.url?scp=85116925950&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2020.2972543
DO - 10.1109/TKDE.2020.2972543
M3 - Article
AN - SCOPUS:85116925950
SN - 1041-4347
VL - 33
SP - 3537
EP - 3549
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 11
ER -