CrowdChart: Crowdsourced Data Extraction from Visualization Charts

Chengliang Chai; Guoliang Li; Ju Fan; Yuyu Luo

doi:10.1109/TKDE.2020.2972543

CrowdChart: Crowdsourced Data Extraction from Visualization Charts

Chengliang Chai, Guoliang Li^*, Ju Fan, Yuyu Luo

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

7 引用（Scopus）

摘要

Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.

源语言	英语
页（从-至）	3537-3549
页数	13
期刊	IEEE Transactions on Knowledge and Data Engineering
卷	33
期	11
DOI	https://doi.org/10.1109/TKDE.2020.2972543
出版状态	已出版 - 1 11月 2021
已对外发布	是

访问文件

10.1109/TKDE.2020.2972543

其它文件与链接

链接到 Scopus 的出版物

引用此

Chai, C., Li, G., Fan, J., & Luo, Y. (2021). CrowdChart: Crowdsourced Data Extraction from Visualization Charts. IEEE Transactions on Knowledge and Data Engineering, 33(11), 3537-3549. https://doi.org/10.1109/TKDE.2020.2972543

@article{305c4a8e23ba4e90b106b5ed3986b975,

title = "CrowdChart: Crowdsourced Data Extraction from Visualization Charts",

abstract = "Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.",

keywords = "Data visualization, crowdsourcing, task assignment, truth inference",

author = "Chengliang Chai and Guoliang Li and Ju Fan and Yuyu Luo",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2021",

month = nov,

day = "1",

doi = "10.1109/TKDE.2020.2972543",

language = "English",

volume = "33",

pages = "3537--3549",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "11",

}

TY - JOUR

T1 - CrowdChart

T2 - Crowdsourced Data Extraction from Visualization Charts

AU - Chai, Chengliang

AU - Li, Guoliang

AU - Fan, Ju

AU - Luo, Yuyu

PY - 2021/11/1

Y1 - 2021/11/1

N2 - Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.

AB - Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to digitalize the data in the charts collected from various sources (e.g., papers and websites), in oder to further analyze the data or create new charts. However, existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first is how to avoid tedious human interaction with charts and design effective crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. Third, to guarantee quality, one may assign a task to many workers, leading to a high crowdsourcing cost. To address these challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise effective task assignment and early-termination mechanisms to save the cost. We evaluate our approach on real-world datasets on real crowdsourced platforms, and the results demonstrate the effectiveness of our method.

KW - Data visualization

KW - crowdsourcing

KW - task assignment

KW - truth inference

UR - http://www.scopus.com/inward/record.url?scp=85116925950&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2020.2972543

DO - 10.1109/TKDE.2020.2972543

M3 - Article

AN - SCOPUS:85116925950

SN - 1041-4347

VL - 33

SP - 3537

EP - 3549

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 11

ER -

CrowdChart: Crowdsourced Data Extraction from Visualization Charts

摘要

访问文件

其它文件与链接

指纹

引用此