Constructing Join Histograms from Histograms with q-error Guarantees

Kaleb Alway, A. Nica
{"title":"Constructing Join Histograms from Histograms with q-error Guarantees","authors":"Kaleb Alway, A. Nica","doi":"10.1145/2882903.2914828","DOIUrl":null,"url":null,"abstract":"Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. The join histogram is built only from the input histograms without accessing the base data or computing the join relation. Under certain restrictions, a q-error guarantee can be placed on the produced join histogram. It is possible to construct adversarial input histograms that produce arbitrarily large q-error in the resulting join histogram, but across several experiments, this type of input does not occur in either randomly generated data or real-world data. Our construction algorithm runs in linear time with respect to the size of the input histograms, and produces a join histogram that is at most as large as the sum of the sizes of the input histograms. These join histograms can be used to efficiently and accurately estimate the cardinality of join queries.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2914828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we present a novel construction algorithm for building a join histogram that accepts two single-column histograms over different attributes, each with q-error guarantees, and produces a histogram over the result of the join operation on these attributes. The join histogram is built only from the input histograms without accessing the base data or computing the join relation. Under certain restrictions, a q-error guarantee can be placed on the produced join histogram. It is possible to construct adversarial input histograms that produce arbitrarily large q-error in the resulting join histogram, but across several experiments, this type of input does not occur in either randomly generated data or real-world data. Our construction algorithm runs in linear time with respect to the size of the input histograms, and produces a join histogram that is at most as large as the sum of the sizes of the input histograms. These join histograms can be used to efficiently and accurately estimate the cardinality of join queries.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从具有q-误差保证的直方图构造连接直方图
直方图可以在任何数据库系统中实现和使用,通常在数据库表的单列上定义。然而,在这样的系统中,最需要的统计数据之一是列之间相关性的统计数据。在本文中,我们提出了一种新的构建连接直方图的算法,该算法接受不同属性上的两个单列直方图,每个都有q-error保证,并在这些属性上的连接操作结果上生成直方图。连接直方图仅从输入直方图构建,而不访问基本数据或计算连接关系。在某些限制下,可以对生成的连接直方图进行q-error保证。可以构建对抗性输入直方图,在最终的连接直方图中产生任意大的q误差,但在几个实验中,这种类型的输入不会出现在随机生成的数据或实际数据中。我们的构建算法相对于输入直方图的大小在线性时间内运行,并产生最多与输入直方图大小总和一样大的连接直方图。这些连接直方图可用于有效而准确地估计连接查询的基数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Experimental Comparison of Thirteen Relational Equi-Joins in Main Memory Rheem: Enabling Multi-Platform Task Execution Wander Join: Online Aggregation for Joins Graph Summarization for Geo-correlated Trends Detection in Social Networks Emma in Action: Declarative Dataflows for Scalable Data Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1