TY - JOUR
T1 - CardOOD
T2 - robust query-driven cardinality estimation under out-of-distribution
AU - Li, Rui
AU - Zhao, Kangfei
AU - Yu, Jeffrey Xu
AU - Wang, Guoren
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2026.
PY - 2026/7
Y1 - 2026/7
N2 - Query-driven learned estimators are accurate, flexible, and lightweight alternatives to traditional estimators in query optimization. However, existing query-driven approaches struggle with the problem of Out-of-Distribution (OOD), where the test workload distribution differs from the training workload, leading to significant performance degradation. In this paper, we present CardOOD, a modular learning framework designed to construct robust query-driven cardinality estimators that are resilient against the OOD problem. Our framework focuses on offline training algorithms that develop one-off models from a static workload, suitable for model initialization and periodic retraining. In CardOOD, we systematically adapt prevailing transfer learning and robust learning techniques, falling into three categories: representation learning, data manipulation, and new learning strategies, and instantiate them for training cardinality estimators. Beyond transferring existing techniques, we propose a novel learning algorithm, OrderEmb, tailored to the specific properties of cardinality estimation. This algorithm, lying in the category of learning strategy, exploits the partial-order constraint on query cardinalities induced by predicate containment. We provide a theoretical analysis of OrderEmb, justifying its ability to enhance representation quality by maximizing mutual information. Comprehensive experimental studies demonstrate the efficacy of the algorithms of CardOOD in mitigating the OOD problem to varying extents. We further integrate CardOOD into PostgreSQL, showcasing its practical utility in end-to-end query optimization.
AB - Query-driven learned estimators are accurate, flexible, and lightweight alternatives to traditional estimators in query optimization. However, existing query-driven approaches struggle with the problem of Out-of-Distribution (OOD), where the test workload distribution differs from the training workload, leading to significant performance degradation. In this paper, we present CardOOD, a modular learning framework designed to construct robust query-driven cardinality estimators that are resilient against the OOD problem. Our framework focuses on offline training algorithms that develop one-off models from a static workload, suitable for model initialization and periodic retraining. In CardOOD, we systematically adapt prevailing transfer learning and robust learning techniques, falling into three categories: representation learning, data manipulation, and new learning strategies, and instantiate them for training cardinality estimators. Beyond transferring existing techniques, we propose a novel learning algorithm, OrderEmb, tailored to the specific properties of cardinality estimation. This algorithm, lying in the category of learning strategy, exploits the partial-order constraint on query cardinalities induced by predicate containment. We provide a theoretical analysis of OrderEmb, justifying its ability to enhance representation quality by maximizing mutual information. Comprehensive experimental studies demonstrate the efficacy of the algorithms of CardOOD in mitigating the OOD problem to varying extents. We further integrate CardOOD into PostgreSQL, showcasing its practical utility in end-to-end query optimization.
KW - Cardinality estimation
KW - Out-of-distribution
KW - Query optimization
UR - https://www.scopus.com/pages/publications/105039659111
U2 - 10.1007/s00778-026-00979-3
DO - 10.1007/s00778-026-00979-3
M3 - Article
AN - SCOPUS:105039659111
SN - 1066-8888
VL - 35
JO - VLDB Journal
JF - VLDB Journal
IS - 4
M1 - 28
ER -