Interactively discovering and ranking desired tuples by data exploration

Xuedi Qin, Chengliang Chai*, Yuyu Luo, Tianyu Zhao, Nan Tang, Guoliang Li*, Jianhua Feng, Xiang Yu, Mourad Ouzzani

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

Data exploration—the problem of extracting knowledge from database even if we do not know exactly what we are looking for —is important for data discovery and analysis. However, precisely specifying SQL queries is not always practical, such as “finding and ranking off-road cars based on a combination of Price, Make, Model, Age, Mileage, etc”—not only due to the query complexity (e.g.,the queries may have many if-then-else, and, or and not logic), but also because the user typically does not have the knowledge of all data instances (and their variants). We propose DExPlorer, a system for interactive data exploration. From the user perspective, we propose a simple and user-friendly interface, which allows to: (1) confirm whether a tuple is desired or not, and (2) decide whether a tuple is more preferred than another. Behind the scenes, we jointly use multiple ML models to learn from the above two types of user feedback. Moreover, in order to effectively involve human-in-the-loop, we need to select a set of tuples for each user interaction so as to solicit feedback. Therefore, we devise question selection algorithms, which consider not only the estimated benefit of each tuple, but also the possible partial orders between any two suggested tuples. Experiments on real-world datasets show that DExPlorer outperforms existing approaches in effectiveness.

Original languageEnglish
Pages (from-to)753-777
Number of pages25
JournalVLDB Journal
Volume31
Issue number4
DOIs
Publication statusPublished - Jul 2022
Externally publishedYes

Keywords

  • Data exploration
  • Decision
  • Human-in-the-loop
  • Ranking
  • SQL query

Fingerprint

Dive into the research topics of 'Interactively discovering and ranking desired tuples by data exploration'. Together they form a unique fingerprint.

Cite this