TY - GEN
T1 - LearnedSQLGen
T2 - 2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
AU - Zhang, Lixi
AU - Chai, Chengliang
AU - Zhou, Xuanhe
AU - Li, Guoliang
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/10
Y1 - 2022/6/10
N2 - Many database optimization problems, e.g., slow SQL diagnosis, database testing, optimizer tuning, require a large volume of SQL queries. Due to privacy issues, it is hard to obtain real SQL queries, and thus SQL generation is a very important task in database optimization. Existing SQL generation methods either randomly generate SQL queries or rely on human-crafted SQL templates to generate SQL queries, but they cannot meet various user specific requirements, e.g., slow SQL queries, SQL queries with large result sizes. To address this problem, this paper studies the problem of constraint-aware SQL generation, which, given a constraint (e.g., cardinality within [1k,2k]), generates SQL queries satisfying the constraint. This problem is rather challenging, because it is rather hard to capture the relationship from query constraint (e.g., cardinality and cost) to SQL queries and thus it is hard to guide a generation method to explore the SQL generation direction towards meeting the constraint. To address this challenge, we propose a reinforcement learning (RL) based framework LearnedSQLGen, for generating queries satisfying the constraint. LearnedSQLGen adopts an exploration-exploitation strategy that exploits the generation direction following the query constraint, which is learned from query execution feedback. We judiciously design the reward function in RL to guide the generation process accurately. We integrate a finite-state machine to generate valid SQL queries. Experimental results on three benchmarks showed that LearnedSQLGen significantly outperformed the baselines in terms of both accuracy (30% better) and efficiency (10-35 times).
AB - Many database optimization problems, e.g., slow SQL diagnosis, database testing, optimizer tuning, require a large volume of SQL queries. Due to privacy issues, it is hard to obtain real SQL queries, and thus SQL generation is a very important task in database optimization. Existing SQL generation methods either randomly generate SQL queries or rely on human-crafted SQL templates to generate SQL queries, but they cannot meet various user specific requirements, e.g., slow SQL queries, SQL queries with large result sizes. To address this problem, this paper studies the problem of constraint-aware SQL generation, which, given a constraint (e.g., cardinality within [1k,2k]), generates SQL queries satisfying the constraint. This problem is rather challenging, because it is rather hard to capture the relationship from query constraint (e.g., cardinality and cost) to SQL queries and thus it is hard to guide a generation method to explore the SQL generation direction towards meeting the constraint. To address this challenge, we propose a reinforcement learning (RL) based framework LearnedSQLGen, for generating queries satisfying the constraint. LearnedSQLGen adopts an exploration-exploitation strategy that exploits the generation direction following the query constraint, which is learned from query execution feedback. We judiciously design the reward function in RL to guide the generation process accurately. We integrate a finite-state machine to generate valid SQL queries. Experimental results on three benchmarks showed that LearnedSQLGen significantly outperformed the baselines in terms of both accuracy (30% better) and efficiency (10-35 times).
KW - SQL generation
KW - database
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85132790127&partnerID=8YFLogxK
U2 - 10.1145/3514221.3526155
DO - 10.1145/3514221.3526155
M3 - Conference contribution
AN - SCOPUS:85132790127
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 945
EP - 958
BT - SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
PB - Association for Computing Machinery
Y2 - 12 June 2022 through 17 June 2022
ER -