Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

Xiang Yu; Chengliang Chai; Guoliang Li; Jiabin Liu

doi:10.14778/3565838.3565846

Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

Xiang Yu, Chengliang Chai, Guoliang Li, Jiabin Liu

Tsinghua University

Research output: Contribution to journal › Conference article › peer-review

37 Citations (Scopus)

Abstract

Traditional cost-based optimizers are efficient and stable to generate optimal plans for simple SQL queries, but they may not generate high-quality plans for complicated queries. Thus learning-based optimizers have been proposed recently that can learn high-quality plans based on past experiences. However, learning-based optimizers cannot work well for dynamic workloads that have different distributions with training examples. In this paper, we propose a hybrid optimizer that adopts the advantages and avoids the shortcomings of these two types of optimizers, which first generates high-quality candidate plans from each type of optimizers and then selects the best plan from the candidates. There are two challenges. (1) How to generate high-quality candidates? We propose a hint-based candidate generation method that leverages the learning-based method to generate highly beneficial hints and then uses a cost-based method to supplement the hints to generate complete plans as candidates. (2) How to evaluate different candidate plans and select the best one? We propose an uncertainty-based optimal plan selection model, which predicts the execution time and the uncertainty for each plan. The uncertainty reflects the confidence of the execution time prediction. We select the plan using the uncertainty model. Experiment results on real datasets showed that our method outperformed the state-of-the-art baselines, and reduced the total latency by 25% and the tail latency by 65% compared to PostgreSQL.

Original language	English
Pages (from-to)	3924-3936
Number of pages	13
Journal	Proceedings of the VLDB Endowment
Volume	15
Issue number	13
DOIs	https://doi.org/10.14778/3565838.3565846
Publication status	Published - 2022
Externally published	Yes
Event	48th International Conference on Very Large Data Bases, VLDB 2022 - Sydney, Australia Duration: 5 Sept 2022 → 9 Sept 2022

Access to Document

10.14778/3565838.3565846

Cite this

Yu, X., Chai, C., Li, G., & Liu, J. (2022). Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection. Proceedings of the VLDB Endowment, 15(13), 3924-3936. https://doi.org/10.14778/3565838.3565846

@article{e511fa5158ad42869d995eca925a4532,

title = "Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection",

abstract = "Traditional cost-based optimizers are efficient and stable to generate optimal plans for simple SQL queries, but they may not generate high-quality plans for complicated queries. Thus learning-based optimizers have been proposed recently that can learn high-quality plans based on past experiences. However, learning-based optimizers cannot work well for dynamic workloads that have different distributions with training examples. In this paper, we propose a hybrid optimizer that adopts the advantages and avoids the shortcomings of these two types of optimizers, which first generates high-quality candidate plans from each type of optimizers and then selects the best plan from the candidates. There are two challenges. (1) How to generate high-quality candidates? We propose a hint-based candidate generation method that leverages the learning-based method to generate highly beneficial hints and then uses a cost-based method to supplement the hints to generate complete plans as candidates. (2) How to evaluate different candidate plans and select the best one? We propose an uncertainty-based optimal plan selection model, which predicts the execution time and the uncertainty for each plan. The uncertainty reflects the confidence of the execution time prediction. We select the plan using the uncertainty model. Experiment results on real datasets showed that our method outperformed the state-of-the-art baselines, and reduced the total latency by 25% and the tail latency by 65% compared to PostgreSQL.",

author = "Xiang Yu and Chengliang Chai and Guoliang Li and Jiabin Liu",

note = "Publisher Copyright: {\textcopyright} 2022, VLDB Endowment. All rights reserved.; 48th International Conference on Very Large Data Bases, VLDB 2022 ; Conference date: 05-09-2022 Through 09-09-2022",

year = "2022",

doi = "10.14778/3565838.3565846",

language = "English",

volume = "15",

pages = "3924--3936",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "Very Large Data Base Endowment Inc.",

number = "13",

}

TY - JOUR

T1 - Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

AU - Yu, Xiang

AU - Chai, Chengliang

AU - Li, Guoliang

AU - Liu, Jiabin

PY - 2022

Y1 - 2022

N2 - Traditional cost-based optimizers are efficient and stable to generate optimal plans for simple SQL queries, but they may not generate high-quality plans for complicated queries. Thus learning-based optimizers have been proposed recently that can learn high-quality plans based on past experiences. However, learning-based optimizers cannot work well for dynamic workloads that have different distributions with training examples. In this paper, we propose a hybrid optimizer that adopts the advantages and avoids the shortcomings of these two types of optimizers, which first generates high-quality candidate plans from each type of optimizers and then selects the best plan from the candidates. There are two challenges. (1) How to generate high-quality candidates? We propose a hint-based candidate generation method that leverages the learning-based method to generate highly beneficial hints and then uses a cost-based method to supplement the hints to generate complete plans as candidates. (2) How to evaluate different candidate plans and select the best one? We propose an uncertainty-based optimal plan selection model, which predicts the execution time and the uncertainty for each plan. The uncertainty reflects the confidence of the execution time prediction. We select the plan using the uncertainty model. Experiment results on real datasets showed that our method outperformed the state-of-the-art baselines, and reduced the total latency by 25% and the tail latency by 65% compared to PostgreSQL.

AB - Traditional cost-based optimizers are efficient and stable to generate optimal plans for simple SQL queries, but they may not generate high-quality plans for complicated queries. Thus learning-based optimizers have been proposed recently that can learn high-quality plans based on past experiences. However, learning-based optimizers cannot work well for dynamic workloads that have different distributions with training examples. In this paper, we propose a hybrid optimizer that adopts the advantages and avoids the shortcomings of these two types of optimizers, which first generates high-quality candidate plans from each type of optimizers and then selects the best plan from the candidates. There are two challenges. (1) How to generate high-quality candidates? We propose a hint-based candidate generation method that leverages the learning-based method to generate highly beneficial hints and then uses a cost-based method to supplement the hints to generate complete plans as candidates. (2) How to evaluate different candidate plans and select the best one? We propose an uncertainty-based optimal plan selection model, which predicts the execution time and the uncertainty for each plan. The uncertainty reflects the confidence of the execution time prediction. We select the plan using the uncertainty model. Experiment results on real datasets showed that our method outperformed the state-of-the-art baselines, and reduced the total latency by 25% and the tail latency by 65% compared to PostgreSQL.

UR - http://www.scopus.com/inward/record.url?scp=85147798886&partnerID=8YFLogxK

U2 - 10.14778/3565838.3565846

DO - 10.14778/3565838.3565846

M3 - Conference article

AN - SCOPUS:85147798886

SN - 2150-8097

VL - 15

SP - 3924

EP - 3936

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 13

T2 - 48th International Conference on Very Large Data Bases, VLDB 2022

Y2 - 5 September 2022 through 9 September 2022

ER -

Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

Abstract

Access to Document

Other files and links

Fingerprint

Cite this