Efficient Dynamic Weighted Set Sampling and Its Extension

Fangyuan Zhang, Mengxu Jiang, Sibo Wang
{"title":"Efficient Dynamic Weighted Set Sampling and Its Extension","authors":"Fangyuan Zhang, Mengxu Jiang, Sibo Wang","doi":"10.14778/3617838.3617840","DOIUrl":null,"url":null,"abstract":"Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3617838.3617840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Given a weighted set S of n elements, weighted set sampling (WSS) samples an element in S so that each element a i ; is sampled with a probability proportional to its weight w ( a i ). The classic alias method pre-processes an index in O ( n ) time with O ( n ) space and handles WSS with O (1) time. Yet, the alias method does not support dynamic updates. By minor modifications of existing dynamic WSS schemes, it is possible to achieve an expected O (1) update time and draw t independent samples in expected O ( t ) time with linear space, which is theoretically optimal. But such a method is impractical and even slower than a binary search tree-based solution. How to support both efficient sampling and updates in practice is still challenging. Motivated by this, we design BUS , an efficient scheme that handles an update in O (1) amortized time and draws t independent samples in O (log n + t) time with linear space. A natural extension of WSS is the weighted independent range sampling (WIRS) , where each element in S is a data point from R. Given an arbitrary range Q = [ℓ, r ] at query time, WIRS aims to do weighted set sampling on the set S Q of data points falling into range Q. We show that by integrating the theoretically optimal dynamic WSS scheme mentioned above, it can handle an update in O (log n ) time and can draw t independent samples for WIRS in O (log n + t ) time, the same as the state-of-the-art static algorithm. Again, such a solution by integrating the optimal dynamic WSS scheme is still impractical to handle WIRS queries. We further propose WIRS-BUS to integrate BUS to handle WIRS queries, which handles each update in O (log n ) time and draws t independent samples in O (log 2 n + t ) time with linear space. Extensive experiments show that our BUS and WIRS-BUS are efficient for both sampling and updates.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高效动态加权集合采样及其扩展
给定一个包含 n 个元素的加权集合 S,加权集合采样(WSS)对 S 中的元素进行采样,这样每个元素 a i ;被采样的概率与其权重 w ( a i ) 成正比。经典的别名法用 O ( n ) 的时间和 O ( n ) 的空间预处理索引,用 O (1) 的时间处理 WSS。然而,别名法不支持动态更新。通过对现有的动态 WSS 方案稍作修改,可以实现预期 O (1) 更新时间,并在预期 O ( t ) 时间内用线性空间绘制 t 个独立样本,这在理论上是最优的。但这种方法并不实用,甚至比基于二叉搜索树的解决方案更慢。如何在实践中同时支持高效采样和更新仍是一个挑战。受此启发,我们设计了一种高效方案 BUS,它能在 O (1) 个摊销时间内处理更新,并在 O (log n + t) 个线性空间内抽取 t 个独立样本。 给定查询时的任意范围 Q = [ℓ, r ],WIRS 的目的是对范围 Q 中的数据点集合 S Q 进行加权集采样。我们的研究表明,通过整合上述理论上最优的动态 WSS 方案,它可以在 O (log n ) 时间内处理一次更新,并在 O (log n + t ) 时间内为 WIRS 绘制 t 个独立样本,与最先进的静态算法相同。同样,这种通过整合最优动态 WSS 方案来处理 WIRS 查询的解决方案仍然不切实际。我们进一步提出了 WIRS-BUS,以整合 BUS 来处理 WIRS 查询,它能在 O (log n ) 时间内处理每次更新,并在 O (log 2 n + t ) 时间内以线性空间绘制 t 个独立样本。大量实验表明,我们的 BUS 和 WIRS-BUS 在采样和更新方面都很高效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cryptographically Secure Private Record Linkage Using Locality-Sensitive Hashing Utility-aware Payment Channel Network Rebalance Relational Query Synthesis ⋈ Decision Tree Learning Billion-Scale Bipartite Graph Embedding: A Global-Local Induced Approach Query Refinement for Diversity Constraint Satisfaction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1