Crowdsourcing-based data extraction from visualization charts

Chengliang Chai, Guoliang Li, Ju Fan, Yuyu Luo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

Visualization charts are widely utilized for presenting structured data. Under many circumstances, people want to explore the data in the charts collected from various sources, such as papers and websites, so as to further analyzing the data or creating new charts. However, the existing automatic and semi-automatic approaches are not always effective due to the variety of charts. In this paper, we introduce a crowdsourcing approach that leverages human ability to extract data from visualization charts. There are several challenges. The first one is how to avoid tedious human interaction with charts and design simple crowdsourcing tasks. Second, it is challenging to evaluate worker's quality for truth inference, because workers may not only provide inaccurate values but also misalign values to wrong data series. To address the challenges, we design an effective crowdsourcing task scheme that splits a chart into simple micro-tasks. We introduce a novel worker quality model by considering worker's accuracy and task difficulty. We also devise an effective early-stopping mechanisms to save the cost. We have conducted experiments on a real crowdsourcing platform, and the results show that our framework outperforms state-of-the-art approaches on both cost and quality.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PublisherIEEE Computer Society
Pages1814-1817
Number of pages4
ISBN (Electronic)9781728129037
DOIs
Publication statusPublished - Apr 2020
Externally publishedYes
Event36th IEEE International Conference on Data Engineering, ICDE 2020 - Dallas, United States
Duration: 20 Apr 202024 Apr 2020

Publication series

NameProceedings - International Conference on Data Engineering
Volume2020-April
ISSN (Print)1084-4627

Conference

Conference36th IEEE International Conference on Data Engineering, ICDE 2020
Country/TerritoryUnited States
CityDallas
Period20/04/2024/04/20

Fingerprint

Dive into the research topics of 'Crowdsourcing-based data extraction from visualization charts'. Together they form a unique fingerprint.

Cite this