TY - GEN
T1 - Reverse engineering complex join queries
AU - Zhang, Meihui
AU - Elmeleegy, Hazem
AU - Procopiuc, Cecilia M.
AU - Srivastava, Divesh
PY - 2013
Y1 - 2013
N2 - We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates Out from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work. In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.
AB - We study the following problem: Given a database D with schema G and an output table Out, compute a join query Q that generates Out from D. A simpler variant allows Q to return a superset of Out. This problem has numerous applications, both by itself, and as a building block for other problems. Related prior work imposes conditions on the structure of Q which are not always consistent with the application, but simplify computation. We discuss several natural SQL queries that do not satisfy these conditions and cannot be discovered by prior work. In this paper, we propose an efficient algorithm that discovers queries with arbitrary join graphs. A crucial insight is that any graph can be characterized by the combination of a simple structure, called a star, and a series of merge steps over the star. The merge steps define a lattice over graphs derived from the same star. This allows us to explore the set of candidate solutions in a principled way and quickly prune out a large number of infeasible graphs. We also design several optimizations that significantly reduce the running time. Finally, we conduct an extensive experimental study over a benchmark database and show that our approach is scalable and accurately discovers complex join queries.
KW - Query join graph
KW - Query lattice
KW - SQL query discovery
UR - http://www.scopus.com/inward/record.url?scp=84880525761&partnerID=8YFLogxK
U2 - 10.1145/2463676.2465320
DO - 10.1145/2463676.2465320
M3 - Conference contribution
AN - SCOPUS:84880525761
SN - 9781450320375
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 809
EP - 820
BT - SIGMOD 2013 - International Conference on Management of Data
T2 - 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013
Y2 - 22 June 2013 through 27 June 2013
ER -