{"title":"Independent range sampling","authors":"Xiaocheng Hu, Miao Qiao, Yufei Tao","doi":"10.1145/2594538.2594545","DOIUrl":null,"url":null,"abstract":"This paper studies the independent range sampling problem. The input is a set P of n points in R. Given an interval q = [x, y] and an integer t ≥ 1, a query returns t elements uniformly sampled (with/without replacement) from P ∩ q. The sampling result must be independent from those returned by the previous queries. The objective is to store P in a structure for answering all queries efficiently. If P fits in memory, the problem is interesting when P is dynamic (i.e., allowing insertions and deletions). The state of the art is a structure of O(n) space that answers a query in O(t log n) time, and supports an update in O(log n) time. We describe a new structure of O(n) space that answers a query in O(log n + t) expected time, and supports an update in O(log n) time. If P does not fit in memory, the problem is challenging even when P is static. The best known structure incurs O(logB n + t) I/Os per query, where B is the block size. We develop a new structure of O(n/B) space that answers a query in O(log* (n/B) + logB n + (t/B) logM/B (n/B)) amortized expected I/Os, where M is the memory size, and log* (n/B) is the number of iterative log2(.) operations we need to perform on n/B before going below a constant. We also give a lower bound argument showing that this is nearly optimal---in particular, the multiplicative term logM/B (n/B) is necessary.","PeriodicalId":302451,"journal":{"name":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2594538.2594545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35
Abstract
This paper studies the independent range sampling problem. The input is a set P of n points in R. Given an interval q = [x, y] and an integer t ≥ 1, a query returns t elements uniformly sampled (with/without replacement) from P ∩ q. The sampling result must be independent from those returned by the previous queries. The objective is to store P in a structure for answering all queries efficiently. If P fits in memory, the problem is interesting when P is dynamic (i.e., allowing insertions and deletions). The state of the art is a structure of O(n) space that answers a query in O(t log n) time, and supports an update in O(log n) time. We describe a new structure of O(n) space that answers a query in O(log n + t) expected time, and supports an update in O(log n) time. If P does not fit in memory, the problem is challenging even when P is static. The best known structure incurs O(logB n + t) I/Os per query, where B is the block size. We develop a new structure of O(n/B) space that answers a query in O(log* (n/B) + logB n + (t/B) logM/B (n/B)) amortized expected I/Os, where M is the memory size, and log* (n/B) is the number of iterative log2(.) operations we need to perform on n/B before going below a constant. We also give a lower bound argument showing that this is nearly optimal---in particular, the multiplicative term logM/B (n/B) is necessary.