流下界与非对称集不相交

Shachar Lovett, Jiapeng Zhang
{"title":"流下界与非对称集不相交","authors":"Shachar Lovett, Jiapeng Zhang","doi":"10.48550/arXiv.2301.05658","DOIUrl":null,"url":null,"abstract":"Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the space complexity of the algorithm, when the data streams are adversarial. However, in the case where the data stream is given in a random order, or is stochastic, only weaker lower bounds exist. In this work we close this gap, up to logarithmic factors. In order to do so we consider the needle problem, which is a natural hard problem for frequency estimation studied in (Andoni et al. 2008, Crouch et al. 2016). Here, the goal is to distinguish between two distributions over data streams with $t$ samples. The first is uniform over a large enough domain. The second is a planted model; a secret ''needle'' is uniformly chosen, and then each element in the stream equals the needle with probability $p$, and otherwise is uniformly chosen from the domain. It is simple to design streaming algorithms that distinguish the distributions using space $s \\approx 1/(p^2 t)$. It was unclear if this is tight, as the existing lower bounds are weaker. We close this gap and show that the trade-off is near optimal, up to a logarithmic factor. Our proof builds and extends classical connections between streaming algorithms and communication complexity, concretely multi-party unique set-disjointness. We introduce two new ingredients that allow us to prove sharp bounds. The first is a lower bound for an asymmetric version of multi-party unique set-disjointness, where players receive input sets of different sizes, and where the communication of each player is normalized relative to their input length. The second is a combinatorial technique that allows to sample needles in the planted model by first sampling intervals, and then sampling a uniform needle in each interval.","PeriodicalId":11639,"journal":{"name":"Electron. Colloquium Comput. Complex.","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Streaming Lower Bounds and Asymmetric Set-Disjointness\",\"authors\":\"Shachar Lovett, Jiapeng Zhang\",\"doi\":\"10.48550/arXiv.2301.05658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the space complexity of the algorithm, when the data streams are adversarial. However, in the case where the data stream is given in a random order, or is stochastic, only weaker lower bounds exist. In this work we close this gap, up to logarithmic factors. In order to do so we consider the needle problem, which is a natural hard problem for frequency estimation studied in (Andoni et al. 2008, Crouch et al. 2016). Here, the goal is to distinguish between two distributions over data streams with $t$ samples. The first is uniform over a large enough domain. The second is a planted model; a secret ''needle'' is uniformly chosen, and then each element in the stream equals the needle with probability $p$, and otherwise is uniformly chosen from the domain. It is simple to design streaming algorithms that distinguish the distributions using space $s \\\\approx 1/(p^2 t)$. It was unclear if this is tight, as the existing lower bounds are weaker. We close this gap and show that the trade-off is near optimal, up to a logarithmic factor. Our proof builds and extends classical connections between streaming algorithms and communication complexity, concretely multi-party unique set-disjointness. We introduce two new ingredients that allow us to prove sharp bounds. The first is a lower bound for an asymmetric version of multi-party unique set-disjointness, where players receive input sets of different sizes, and where the communication of each player is normalized relative to their input length. The second is a combinatorial technique that allows to sample needles in the planted model by first sampling intervals, and then sampling a uniform needle in each interval.\",\"PeriodicalId\":11639,\"journal\":{\"name\":\"Electron. Colloquium Comput. Complex.\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electron. Colloquium Comput. Complex.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.05658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electron. Colloquium Comput. Complex.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.05658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

数据流中的频率估计是流算法中的经典问题之一。经过大量的研究,当数据流是对抗性的时,对于样本数量和算法的空间复杂度之间的权衡,现在几乎有匹配的上限和下限。然而,在数据流以随机顺序给定的情况下,或者是随机的,只存在较弱的下界。在这项工作中,我们缩小了这个差距,直到对数因子。为了做到这一点,我们考虑了针问题,这是在(Andoni et al. 2008, Crouch et al. 2016)中研究的频率估计的自然难题。这里的目标是区分具有$t$样本的数据流上的两个分布。第一种在足够大的范围内是均匀的。第二种是被植入的模型;统一选择一个秘密“针”,然后流中的每个元素以概率$p$等于针,否则从域中统一选择。设计使用空间$s \约1/(p^2 t)$来区分分布的流算法很简单。目前尚不清楚这是否严格,因为现有的下限较弱。我们缩小了这个差距,并表明权衡接近最优,达到对数因子。我们的证明建立并扩展了流算法和通信复杂性之间的经典联系,具体来说是多方唯一集不连接。我们引入两种新的成分,使我们能够证明尖锐的界限。第一个是多方唯一集不连接的非对称版本的下界,其中玩家接收不同大小的输入集,并且每个玩家的通信相对于他们的输入长度进行规范化。第二种是一种组合技术,允许在种植模型中通过第一次采样间隔对针进行采样,然后在每个间隔中对均匀的针进行采样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Streaming Lower Bounds and Asymmetric Set-Disjointness
Frequency estimation in data streams is one of the classical problems in streaming algorithms. Following much research, there are now almost matching upper and lower bounds for the trade-off needed between the number of samples and the space complexity of the algorithm, when the data streams are adversarial. However, in the case where the data stream is given in a random order, or is stochastic, only weaker lower bounds exist. In this work we close this gap, up to logarithmic factors. In order to do so we consider the needle problem, which is a natural hard problem for frequency estimation studied in (Andoni et al. 2008, Crouch et al. 2016). Here, the goal is to distinguish between two distributions over data streams with $t$ samples. The first is uniform over a large enough domain. The second is a planted model; a secret ''needle'' is uniformly chosen, and then each element in the stream equals the needle with probability $p$, and otherwise is uniformly chosen from the domain. It is simple to design streaming algorithms that distinguish the distributions using space $s \approx 1/(p^2 t)$. It was unclear if this is tight, as the existing lower bounds are weaker. We close this gap and show that the trade-off is near optimal, up to a logarithmic factor. Our proof builds and extends classical connections between streaming algorithms and communication complexity, concretely multi-party unique set-disjointness. We introduce two new ingredients that allow us to prove sharp bounds. The first is a lower bound for an asymmetric version of multi-party unique set-disjointness, where players receive input sets of different sizes, and where the communication of each player is normalized relative to their input length. The second is a combinatorial technique that allows to sample needles in the planted model by first sampling intervals, and then sampling a uniform needle in each interval.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dependency schemes in CDCL-based QBF solving: a proof-theoretic study On blocky ranks of matrices Fractional Linear Matroid Matching is in quasi-NC Aaronson-Ambainis Conjecture Is True For Random Restrictions Optimal Pseudorandom Generators for Low-Degree Polynomials Over Moderately Large Fields
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1