Algorithmic Techniques for Independent Query Sampling

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems Pub Date : 2022-06-12 DOI:10.1145/3517804.3526068

Yufei Tao

{"title":"Algorithmic Techniques for Independent Query Sampling","authors":"Yufei Tao","doi":"10.1145/3517804.3526068","DOIUrl":null,"url":null,"abstract":"Unlike a reporting query that returns all the elements satisfying a predicate, query sampling returns only a sample set of those elements and has long been recognized as an important method in database systems. PODS'14 saw the introduction of independent query sampling (IQS), which extends traditional query sampling with the requirement that the sample outputs of all the queries be mutually independent. The new requirement improves the precision of query estimation, facilitates the execution of randomized algorithms, and enhances the fairness and diversity of query answers. IQS calls for new index structures because conventional indexes are designed to report complete query answers and thus becomes too expensive for extracting only a few random samples. The phenomenon has created an exciting opportunity to revisit the structure for every reporting query known in computer science. There has been considerable progress since 2014 in this direction. This paper distills the existing solutions into several generic techniques that, when put together, can be utilized to solve a great variety of IQS problems with attractive performance guarantees.","PeriodicalId":230606,"journal":{"name":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3517804.3526068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Unlike a reporting query that returns all the elements satisfying a predicate, query sampling returns only a sample set of those elements and has long been recognized as an important method in database systems. PODS'14 saw the introduction of independent query sampling (IQS), which extends traditional query sampling with the requirement that the sample outputs of all the queries be mutually independent. The new requirement improves the precision of query estimation, facilitates the execution of randomized algorithms, and enhances the fairness and diversity of query answers. IQS calls for new index structures because conventional indexes are designed to report complete query answers and thus becomes too expensive for extracting only a few random samples. The phenomenon has created an exciting opportunity to revisit the structure for every reporting query known in computer science. There has been considerable progress since 2014 in this direction. This paper distills the existing solutions into several generic techniques that, when put together, can be utilized to solve a great variety of IQS problems with attractive performance guarantees.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

独立查询抽样的算法技术

与返回满足谓词的所有元素的报告查询不同，查询抽样只返回这些元素的一个样本集，并且一直被认为是数据库系统中的一种重要方法。PODS’14引入了独立查询抽样(IQS)，它扩展了传统的查询抽样，要求所有查询的样本输出是相互独立的。新的要求提高了查询估计的精度，方便了随机化算法的执行，增强了查询答案的公平性和多样性。IQS需要新的索引结构，因为传统的索引被设计为报告完整的查询答案，因此对于仅提取少量随机样本来说，成本太高。这种现象为重新审视计算机科学中已知的每个报告查询的结构创造了一个令人兴奋的机会。自2014年以来，在这个方向上取得了相当大的进展。本文将现有的解决方案提炼成几种通用技术，当这些技术组合在一起时，可以用于解决各种具有吸引力性能保证的iq问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

自引率

0.00%

发文量

期刊最新文献

The Gibbs-Rand Model Optimal Algorithms for Multiway Search on Partial Orders Estimation of the Size of Union of Delphic Sets: Achieving Independence from Stream Size The Complexity of Regular Trail and Simple Path Queries on Undirected Graphs Data Path Queries over Embedded Graph Databases