TY - GEN
T1 - Interactive cleaning for progressive visualization through composite questions
AU - Luo, Yuyu
AU - Chai, Chengliang
AU - Qin, Xuedi
AU - Tang, Nan
AU - Li, Guoliang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - In this paper, we study the problem of interactive cleaning for progressive visualization (ICPV): Given a bad visualization V , it is to obtain a "cleaned" visualization V whose distance is far from V , under a given (small) budget w.r.t. human cost. In ICPV, a system interacts with a user iteratively. During each iteration, it asks the user a data cleaning question such as "how to clean detected errors x?", and takes value updates from the user to clean V. Conventional wisdom typically picks a single question (e.g., "Are SIGMOD conference and SIGMOD the same?") with the maximum expected benefit in each iteration. We propose to use a composite question - i.e., a group of single questions to be treated as one question - in each iteration (for example, Are SIGMOD conference in t1 and SIGMOD in t2 the same value, and are t1 and t2 duplicates?). A composite question is presented to the user as a small connected graph through a novel GUI that the user can directly operate on. We propose algorithms to select the best composite question in each iteration. Experiments on real-world datasets verify that composite questions are more effective than asking single questions in isolation w.r.t. the human cost.
AB - In this paper, we study the problem of interactive cleaning for progressive visualization (ICPV): Given a bad visualization V , it is to obtain a "cleaned" visualization V whose distance is far from V , under a given (small) budget w.r.t. human cost. In ICPV, a system interacts with a user iteratively. During each iteration, it asks the user a data cleaning question such as "how to clean detected errors x?", and takes value updates from the user to clean V. Conventional wisdom typically picks a single question (e.g., "Are SIGMOD conference and SIGMOD the same?") with the maximum expected benefit in each iteration. We propose to use a composite question - i.e., a group of single questions to be treated as one question - in each iteration (for example, Are SIGMOD conference in t1 and SIGMOD in t2 the same value, and are t1 and t2 duplicates?). A composite question is presented to the user as a small connected graph through a novel GUI that the user can directly operate on. We propose algorithms to select the best composite question in each iteration. Experiments on real-world datasets verify that composite questions are more effective than asking single questions in isolation w.r.t. the human cost.
UR - http://www.scopus.com/inward/record.url?scp=85085867523&partnerID=8YFLogxK
U2 - 10.1109/ICDE48307.2020.00069
DO - 10.1109/ICDE48307.2020.00069
M3 - Conference contribution
AN - SCOPUS:85085867523
T3 - Proceedings - International Conference on Data Engineering
SP - 733
EP - 744
BT - Proceedings - 2020 IEEE 36th International Conference on Data Engineering, ICDE 2020
PB - IEEE Computer Society
T2 - 36th IEEE International Conference on Data Engineering, ICDE 2020
Y2 - 20 April 2020 through 24 April 2020
ER -