MAXENT: consistent cardinality estimation in action

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI:10.1145/1142473.1142586

V. Markl, M. Kutsch, T. Tran, P. Haas, N. Megiddo

{"title":"MAXENT: consistent cardinality estimation in action","authors":"V. Markl, M. Kutsch, T. Tran, P. Haas, N. Megiddo","doi":"10.1145/1142473.1142586","DOIUrl":null,"url":null,"abstract":"When comparing alternative query execution plans (QEPs), a cost-based query optimizer in a relational database management system needs to estimate the selectivity of conjunctive predicates. To avoid inaccurate independence assumptions, modern optimizers try to exploit multivariate statistics (MVS) that provide knowledge about joint frequencies in a table of a relation. Because the complete joint distribution is almost always too large to store, optimizers are given only partial knowledge about this distribution. As a result, there exist multiple, non-equivalent ways to estimate the selectivity of a conjunctive predicate. To consistently combine the partial knowledge during the estimation process, existing optimizers employ cumbersome ad hoc heuristics. These methods unjustifiably ignore valuable information, and the optimizer tends to favor QEPs for which the least information is available. This bias problem yields poor QEP quality and performance. We demonstrate MAXENT, a novel approach based on the maximum entropy principle, prototyped in IBM DB2 LUW. We illustrate MAXENT's ability to consistently estimate the selectivity of conjunctive predicates on a per-table basis. In contrast to the DB2 optimizer's current ad hoc methods, we show how MAXENT exploits all available information about the joint column distribution and thus avoids the bias problem. For some complex queries against a real-world database, we show that MAXENT improves selectivity estimates by orders of magnitude relative to the current DB2 optimizer, and also show how these improved estimate influence plan choices as well as query execution times.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1142473.1142586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

When comparing alternative query execution plans (QEPs), a cost-based query optimizer in a relational database management system needs to estimate the selectivity of conjunctive predicates. To avoid inaccurate independence assumptions, modern optimizers try to exploit multivariate statistics (MVS) that provide knowledge about joint frequencies in a table of a relation. Because the complete joint distribution is almost always too large to store, optimizers are given only partial knowledge about this distribution. As a result, there exist multiple, non-equivalent ways to estimate the selectivity of a conjunctive predicate. To consistently combine the partial knowledge during the estimation process, existing optimizers employ cumbersome ad hoc heuristics. These methods unjustifiably ignore valuable information, and the optimizer tends to favor QEPs for which the least information is available. This bias problem yields poor QEP quality and performance. We demonstrate MAXENT, a novel approach based on the maximum entropy principle, prototyped in IBM DB2 LUW. We illustrate MAXENT's ability to consistently estimate the selectivity of conjunctive predicates on a per-table basis. In contrast to the DB2 optimizer's current ad hoc methods, we show how MAXENT exploits all available information about the joint column distribution and thus avoids the bias problem. For some complex queries against a real-world database, we show that MAXENT improves selectivity estimates by orders of magnitude relative to the current DB2 optimizer, and also show how these improved estimate influence plan choices as well as query execution times.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MAXENT:一致的基数估计在行动

在比较备选查询执行计划(qep)时，关系数据库管理系统中基于成本的查询优化器需要估计连接谓词的选择性。为了避免不准确的独立性假设，现代优化器尝试利用多元统计(MVS)来提供关于关系表中的联合频率的知识。因为完整的联合分布几乎总是太大而无法存储，所以优化器只能获得关于该分布的部分知识。因此，存在多个非等价的方法来估计一个连接谓词的选择性。为了在估计过程中一致地组合部分知识，现有的优化器使用了繁琐的临时启发式方法。这些方法不合理地忽略了有价值的信息，优化器倾向于支持可用信息最少的qep。这种偏差问题会导致较差的QEP质量和性能。我们演示了MAXENT，这是一种基于最大熵原理的新方法，在IBM DB2 LUW中原型化。我们演示了MAXENT在每个表的基础上一致地估计连接谓词的选择性的能力。与DB2优化器当前的特设方法不同，我们将展示MAXENT如何利用有关联合列分布的所有可用信息，从而避免偏差问题。对于针对真实数据库的一些复杂查询，我们展示了MAXENT相对于当前DB2优化器将选择性估计提高了几个数量级，并且还展示了这些改进的估计如何影响计划选择以及查询执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助