LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning

Lixi Zhang, Chengliang Chai, Xuanhe Zhou, Guoliang Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Citations (Scopus)

Abstract

Many database optimization problems, e.g., slow SQL diagnosis, database testing, optimizer tuning, require a large volume of SQL queries. Due to privacy issues, it is hard to obtain real SQL queries, and thus SQL generation is a very important task in database optimization. Existing SQL generation methods either randomly generate SQL queries or rely on human-crafted SQL templates to generate SQL queries, but they cannot meet various user specific requirements, e.g., slow SQL queries, SQL queries with large result sizes. To address this problem, this paper studies the problem of constraint-aware SQL generation, which, given a constraint (e.g., cardinality within [1k,2k]), generates SQL queries satisfying the constraint. This problem is rather challenging, because it is rather hard to capture the relationship from query constraint (e.g., cardinality and cost) to SQL queries and thus it is hard to guide a generation method to explore the SQL generation direction towards meeting the constraint. To address this challenge, we propose a reinforcement learning (RL) based framework LearnedSQLGen, for generating queries satisfying the constraint. LearnedSQLGen adopts an exploration-exploitation strategy that exploits the generation direction following the query constraint, which is learned from query execution feedback. We judiciously design the reward function in RL to guide the generation process accurately. We integrate a finite-state machine to generate valid SQL queries. Experimental results on three benchmarks showed that LearnedSQLGen significantly outperformed the baselines in terms of both accuracy (30% better) and efficiency (10-35 times).

Original languageEnglish
Title of host publicationSIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages945-958
Number of pages14
ISBN (Electronic)9781450392495
DOIs
Publication statusPublished - 10 Jun 2022
Externally publishedYes
Event2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022 - Virtual, Online, United States
Duration: 12 Jun 202217 Jun 2022

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Country/TerritoryUnited States
CityVirtual, Online
Period12/06/2217/06/22

Keywords

  • SQL generation
  • database
  • machine learning

Fingerprint

Dive into the research topics of 'LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning'. Together they form a unique fingerprint.

Cite this