Maximum Coverage in Sublinear Space, Faster

Stephen Jaud, Anthony Wirth, F. Choudhury
{"title":"Maximum Coverage in Sublinear Space, Faster","authors":"Stephen Jaud, Anthony Wirth, F. Choudhury","doi":"10.48550/arXiv.2302.06137","DOIUrl":null,"url":null,"abstract":"Given a collection of $m$ sets from a universe $\\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe $|\\mathcal{U}|$. However, one randomized streaming algorithm has been shown to produce a $1-1/e-\\varepsilon$ approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to $m$ and $|\\mathcal{U}|$. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. We first show how to accelerate by several orders of magnitude without altering the space complexity, number of passes and approximation quality of the original algorithm. Secondly, we derive a new lower bound for the probability of producing a $1-1/e-\\varepsilon$ approximation using only pairwise independence: $1-\\tfrac{4}{c k \\log m}$ compared to the original $1-\\tfrac{2e}{m^{ck/6}}$. Although the theoretical approximation guarantees are weaker, for large streams, our algorithm performs well in practice and present the best time-space-performance trade-off for maximum coverage in streams.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"37 1","pages":"21:1-21:20"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the Society of Sea Water Science, Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.06137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Given a collection of $m$ sets from a universe $\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe $|\mathcal{U}|$. However, one randomized streaming algorithm has been shown to produce a $1-1/e-\varepsilon$ approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to $m$ and $|\mathcal{U}|$. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. We first show how to accelerate by several orders of magnitude without altering the space complexity, number of passes and approximation quality of the original algorithm. Secondly, we derive a new lower bound for the probability of producing a $1-1/e-\varepsilon$ approximation using only pairwise independence: $1-\tfrac{4}{c k \log m}$ compared to the original $1-\tfrac{2e}{m^{ck/6}}$. Although the theoretical approximation guarantees are weaker, for large streams, our algorithm performs well in practice and present the best time-space-performance trade-off for maximum coverage in streams.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
最大覆盖亚线性空间,更快
给定一个宇宙$\mathcal{U}$中$m$个集合的集合,最大集合覆盖问题包括找到其并集具有最大基数的$k$个集合。这个问题是np困难的,但是解决方案可以用多项式时间算法近似到一个因子$1-1/e$。然而,该算法不能很好地随输入大小进行伸缩。在流环境中,找到了实用的高质量解决方案,但具有相对于宇宙大小线性扩展的空间复杂性$|\mathcal{U}|$。然而,一种随机流算法已被证明可以产生最优解的$1-1/e-\varepsilon$近似,其空间复杂度仅相对于$m$和$|\mathcal{U}|$进行多对数缩放。为了实现如此低的空间复杂度,作者使用了一种基于独立哈希函数的称为子采样的技术。本文重点研究了这种次线性空间算法,并介绍了降低次采样时间成本的方法。我们首先展示了如何在不改变原始算法的空间复杂度、通过次数和近似质量的情况下加速几个数量级。其次,我们推导出仅使用成对独立产生$1-1/e-\varepsilon$近似的概率的新下界:$1-\tfrac{4}{c k \log m}$与原始的$1-\tfrac{2e}{m^{ck/6}}$相比。虽然理论上的近似保证较弱,但对于大型流,我们的算法在实践中表现良好,并且在流的最大覆盖方面提供了最佳的时间-空间性能权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Efficient Yao Graph Construction Partitioning the Bags of a Tree Decomposition Into Cliques Arc-Flags Meet Trip-Based Public Transit Routing Maximum Coverage in Sublinear Space, Faster FREIGHT: Fast Streaming Hypergraph Partitioning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1