Random Sampling and Size Estimation Over Cyclic Joins

Y. Chen, K. Yi
{"title":"Random Sampling and Size Estimation Over Cyclic Joins","authors":"Y. Chen, K. Yi","doi":"10.4230/LIPIcs.ICDT.2020.7","DOIUrl":null,"url":null,"abstract":"Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [7] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN/OUT), where IN is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time. 2012 ACM Subject Classification Theory of computation → Database theory","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ICDT.2020.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Computing joins is expensive, and often unnecessary when the output size is large. In 1999, Chaudhuri et al. [7] posed the problem of random sampling over joins as a potentially effective approach to avoiding computing the join in full, while obtaining important statistical information about the join results. Unfortunately, no significant progress has been made in the last 20 years, except for the case of acyclic joins. In this paper, we present the first non-trivial result on sampling over cyclic joins. We show that after a linear-time preprocessing step, a join result can be drawn uniformly at random in expected time O(IN/OUT), where IN is known as the AGM bound of the join and OUT is its output size. This result holds for all joins on binary relations, as well as certain joins on relations of higher arity. We further show how this algorithm immediately leads to a join size estimation algorithm with the same running time. 2012 ACM Subject Classification Theory of computation → Database theory
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
循环连接上的随机抽样和大小估计
计算连接的成本很高,而且当输出大小很大时通常没有必要。1999年,Chaudhuri等人[7]提出了连接上的随机抽样问题,作为一种潜在的有效方法,可以避免完全计算连接,同时获得有关连接结果的重要统计信息。不幸的是,在过去的20年里,除了无环连接的情况外,没有取得重大进展。本文给出了循环连接上采样的第一个非平凡结果。我们表明,在线性时间预处理步骤之后,可以在预期时间0 (in /OUT)内均匀随机地绘制连接结果,其中in称为连接的AGM边界,OUT是其输出大小。这个结果适用于所有二元关系上的连接,以及某些更高密度关系上的连接。我们将进一步展示该算法如何立即生成具有相同运行时间的连接大小估计算法。2012 ACM学科分类:计算理论→数据库理论
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generalizing Greenwald-Khanna Streaming Quantile Summaries for Weighted Inputs A Simple Algorithm for Consistent Query Answering under Primary Keys Size Bounds and Algorithms for Conjunctive Regular Path Queries Compact Data Structures Meet Databases (Invited Talk) Enumerating Subgraphs of Constant Sizes in External Memory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1