Crowdsourcing-based data extraction from visualization charts

Chengliang Chai; Guoliang Li; Ju Fan; Yuyu Luo

doi:10.1109/ICDE48307.2020.00177

Crowdsourcing-based data extraction from visualization charts

Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

7 Citations (Scopus)

Abstract

Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to explore the data in the charts collected from various sources, such as papers and websites, so as to further analyzing the data or creating new charts. However, the existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first one is how to avoid tedious human interaction with charts and design simple crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. To address the challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise an effective early-stopping mechanisms to save the cost. We have conducted experiments on a real crowdsourcing platform, and the results show that our framework outperforms state-of-the-art approaches on both cost and quality.

Original language	English
Title of host publication	Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
Publisher	IEEE Computer Society
Pages	1814-1817
Number of pages	4
ISBN (Electronic)	9781728129037
DOIs	https://doi.org/10.1109/ICDE48307.2020.00177
Publication status	Published - Apr 2020
Externally published	Yes
Event	36th IEEE International Conference on Data Engineering, ICDE 2020 - Dallas, United States Duration: 20 Apr 2020 → 24 Apr 2020

Publication series

Name	Proceedings - International Conference on Data Engineering
Volume	2020-April
ISSN (Print)	1084-4627

Conference

Conference	36th IEEE International Conference on Data Engineering, ICDE 2020
Country/Territory	United States
City	Dallas
Period	20/04/20 → 24/04/20

Access to Document

10.1109/ICDE48307.2020.00177

Cite this

Chai, C., Li, G., Fan, J., & Luo, Y. (2020). Crowdsourcing-based data extraction from visualization charts. In Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020 (pp. 1814-1817). Article 9101527 (Proceedings - International Conference on Data Engineering; Vol. 2020-April). IEEE Computer Society. https://doi.org/10.1109/ICDE48307.2020.00177

@inproceedings{879f2071e247495b8ad5109e483e4781,

title = "Crowdsourcing-based data extraction from visualization charts",

abstract = "Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to explore the data in the charts collected from various sources, such as papers and websites, so as to further analyzing the data or creating new charts. However, the existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first one is how to avoid tedious human interaction with charts and design simple crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. To address the challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise an effective early-stopping mechanisms to save the cost. We have conducted experiments on a real crowdsourcing platform, and the results show that our framework outperforms state-of-the-art approaches on both cost and quality.",

author = "Chengliang Chai and Guoliang Li and Ju Fan and Yuyu Luo",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 36th IEEE International Conference on Data Engineering, ICDE 2020 ; Conference date: 20-04-2020 Through 24-04-2020",

year = "2020",

month = apr,

doi = "10.1109/ICDE48307.2020.00177",

language = "English",

series = "Proceedings - International Conference on Data Engineering",

publisher = "IEEE Computer Society",

pages = "1814--1817",

booktitle = "Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020",

address = "United States",

}

Chai, C, Li, G, Fan, J & Luo, Y 2020, Crowdsourcing-based data extraction from visualization charts. in Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020., 9101527, Proceedings - International Conference on Data Engineering, vol. 2020-April, IEEE Computer Society, pp. 1814-1817, 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, United States, 20/04/20. https://doi.org/10.1109/ICDE48307.2020.00177

Crowdsourcing-based data extraction from visualization charts. / Chai, Chengliang; Li, Guoliang; Fan, Ju et al.
Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020. IEEE Computer Society, 2020. p. 1814-1817 9101527 (Proceedings - International Conference on Data Engineering; Vol. 2020-April).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Crowdsourcing-based data extraction from visualization charts

AU - Chai, Chengliang

AU - Li, Guoliang

AU - Fan, Ju

AU - Luo, Yuyu

PY - 2020/4

Y1 - 2020/4

N2 - Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to explore the data in the charts collected from various sources, such as papers and websites, so as to further analyzing the data or creating new charts. However, the existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first one is how to avoid tedious human interaction with charts and design simple crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. To address the challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise an effective early-stopping mechanisms to save the cost. We have conducted experiments on a real crowdsourcing platform, and the results show that our framework outperforms state-of-the-art approaches on both cost and quality.

AB - Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to explore the data in the charts collected from various sources, such as papers and websites, so as to further analyzing the data or creating new charts. However, the existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first one is how to avoid tedious human interaction with charts and design simple crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. To address the challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise an effective early-stopping mechanisms to save the cost. We have conducted experiments on a real crowdsourcing platform, and the results show that our framework outperforms state-of-the-art approaches on both cost and quality.

UR - http://www.scopus.com/inward/record.url?scp=85085859037&partnerID=8YFLogxK

U2 - 10.1109/ICDE48307.2020.00177

DO - 10.1109/ICDE48307.2020.00177

M3 - Conference contribution

AN - SCOPUS:85085859037

T3 - Proceedings - International Conference on Data Engineering

SP - 1814

EP - 1817

BT - Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020

PB - IEEE Computer Society

T2 - 36th IEEE International Conference on Data Engineering, ICDE 2020

Y2 - 20 April 2020 through 24 April 2020

ER -

Crowdsourcing-based data extraction from visualization charts

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this