LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning

Lixi Zhang, Chengliang Chai, Xuanhe Zhou, Guoliang Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

13 引用 (Scopus)

摘要

Many database optimization problems, e.g., slow SQL diagnosis, database testing, optimizer tuning, require a large volume of SQL queries. Due to privacy issues, it is hard to obtain real SQL queries, and thus SQL generation is a very important task in database optimization. Existing SQL generation methods either randomly generate SQL queries or rely on human-crafted SQL templates to generate SQL queries, but they cannot meet various user specific requirements, e.g., slow SQL queries, SQL queries with large result sizes. To address this problem, this paper studies the problem of constraint-aware SQL generation, which, given a constraint (e.g., cardinality within [1k,2k]), generates SQL queries satisfying the constraint. This problem is rather challenging, because it is rather hard to capture the relationship from query constraint (e.g., cardinality and cost) to SQL queries and thus it is hard to guide a generation method to explore the SQL generation direction towards meeting the constraint. To address this challenge, we propose a reinforcement learning (RL) based framework LearnedSQLGen, for generating queries satisfying the constraint. LearnedSQLGen adopts an exploration-exploitation strategy that exploits the generation direction following the query constraint, which is learned from query execution feedback. We judiciously design the reward function in RL to guide the generation process accurately. We integrate a finite-state machine to generate valid SQL queries. Experimental results on three benchmarks showed that LearnedSQLGen significantly outperformed the baselines in terms of both accuracy (30% better) and efficiency (10-35 times).

源语言英语
主期刊名SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
出版商Association for Computing Machinery
945-958
页数14
ISBN(电子版)9781450392495
DOI
出版状态已出版 - 10 6月 2022
已对外发布
活动2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022 - Virtual, Online, 美国
期限: 12 6月 202217 6月 2022

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
ISSN(印刷版)0730-8078

会议

会议2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
国家/地区美国
Virtual, Online
时期12/06/2217/06/22

指纹

探究 'LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此

Zhang, L., Chai, C., Zhou, X., & Li, G. (2022). LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning. 在 SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data (页码 945-958). (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3514221.3526155