TY - JOUR
T1 - Interactively discovering and ranking desired tuples by data exploration
AU - Qin, Xuedi
AU - Chai, Chengliang
AU - Luo, Yuyu
AU - Zhao, Tianyu
AU - Tang, Nan
AU - Li, Guoliang
AU - Feng, Jianhua
AU - Yu, Xiang
AU - Ouzzani, Mourad
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2022/7
Y1 - 2022/7
N2 - Data exploration—the problem of extracting knowledge from database even if we do not know exactly what we are looking for —is important for data discovery and analysis. However, precisely specifying SQL queries is not always practical, such as “finding and ranking off-road cars based on a combination of Price, Make, Model, Age, Mileage, etc”—not only due to the query complexity (e.g.,the queries may have many if-then-else, and, or and not logic), but also because the user typically does not have the knowledge of all data instances (and their variants). We propose DExPlorer, a system for interactive data exploration. From the user perspective, we propose a simple and user-friendly interface, which allows to: (1) confirm whether a tuple is desired or not, and (2) decide whether a tuple is more preferred than another. Behind the scenes, we jointly use multiple ML models to learn from the above two types of user feedback. Moreover, in order to effectively involve human-in-the-loop, we need to select a set of tuples for each user interaction so as to solicit feedback. Therefore, we devise question selection algorithms, which consider not only the estimated benefit of each tuple, but also the possible partial orders between any two suggested tuples. Experiments on real-world datasets show that DExPlorer outperforms existing approaches in effectiveness.
AB - Data exploration—the problem of extracting knowledge from database even if we do not know exactly what we are looking for —is important for data discovery and analysis. However, precisely specifying SQL queries is not always practical, such as “finding and ranking off-road cars based on a combination of Price, Make, Model, Age, Mileage, etc”—not only due to the query complexity (e.g.,the queries may have many if-then-else, and, or and not logic), but also because the user typically does not have the knowledge of all data instances (and their variants). We propose DExPlorer, a system for interactive data exploration. From the user perspective, we propose a simple and user-friendly interface, which allows to: (1) confirm whether a tuple is desired or not, and (2) decide whether a tuple is more preferred than another. Behind the scenes, we jointly use multiple ML models to learn from the above two types of user feedback. Moreover, in order to effectively involve human-in-the-loop, we need to select a set of tuples for each user interaction so as to solicit feedback. Therefore, we devise question selection algorithms, which consider not only the estimated benefit of each tuple, but also the possible partial orders between any two suggested tuples. Experiments on real-world datasets show that DExPlorer outperforms existing approaches in effectiveness.
KW - Data exploration
KW - Decision
KW - Human-in-the-loop
KW - Ranking
KW - SQL query
UR - http://www.scopus.com/inward/record.url?scp=85123090782&partnerID=8YFLogxK
U2 - 10.1007/s00778-021-00714-0
DO - 10.1007/s00778-021-00714-0
M3 - Article
AN - SCOPUS:85123090782
SN - 1066-8888
VL - 31
SP - 753
EP - 777
JO - VLDB Journal
JF - VLDB Journal
IS - 4
ER -