Beating CountSketch for heavy hitters in insertion streams

V. Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff
{"title":"Beating CountSketch for heavy hitters in insertion streams","authors":"V. Braverman, Stephen R. Chestnut, Nikita Ivkin, David P. Woodruff","doi":"10.1145/2897518.2897558","DOIUrl":null,"url":null,"abstract":"Given a stream p1, …, pm of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is the number of occurrences of item j in the stream, and F2 = ∑i ∈ [n] fi2. Such a guarantee is considerably stronger than the ℓ1-guarantee, which finds those j for which fj ≥ є m. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which finds all such j using Θ(log2 n) bits of space (for constant є > 0). The only known lower bound is Ω(logn) bits of space, which comes from the need to specify the identities of the items found. In this paper we show one can achieve O(logn loglogn) bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including: (1) The first algorithm for estimating F2 simultaneously at all points in a stream using only O(lognloglogn) bits of space, improving a natural union bound. (2) A way to estimate the ℓ∞ norm of a stream up to additive error є √F2 with O(lognloglogn) bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.","PeriodicalId":442965,"journal":{"name":"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the forty-eighth annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897518.2897558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

Given a stream p1, …, pm of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ2-heavy hitters, i.e., those items j for which fj ≥ є √F2, where fj is the number of occurrences of item j in the stream, and F2 = ∑i ∈ [n] fi2. Such a guarantee is considerably stronger than the ℓ1-guarantee, which finds those j for which fj ≥ є m. In 2002, Charikar, Chen, and Farach-Colton suggested the CountSketch data structure, which finds all such j using Θ(log2 n) bits of space (for constant є > 0). The only known lower bound is Ω(logn) bits of space, which comes from the need to specify the identities of the items found. In this paper we show one can achieve O(logn loglogn) bits of space for this problem. Our techniques, based on Gaussian processes, lead to a number of other new results for data streams, including: (1) The first algorithm for estimating F2 simultaneously at all points in a stream using only O(lognloglogn) bits of space, improving a natural union bound. (2) A way to estimate the ℓ∞ norm of a stream up to additive error є √F2 with O(lognloglogn) bits of space, resolving Open Question 3 from the IITK 2006 list for insertion only streams.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
插入流中重击者的敲打计数草图
给定一个由来自宇宙U的项目组成的流p1,…,pm,在不丧失一般性的情况下,我们用整数集{1,2,…,n}来标识这个流,我们考虑返回所有l2 -重锤的问题,即那些项目j,其中fj是项目j在流中出现的次数,并且F2 =∑i∈[n] fi2。这样的保证比1保证强得多,它可以找到fj≥m的j。在2002年,Charikar, Chen和Farach-Colton提出了countskey数据结构,它使用Θ(log2 n)位空间(对于常数,> 0)来找到所有这样的j。唯一已知的下界是Ω(logn)位空间,这来自于需要指定所找到的项目的身份。在本文中,我们展示了可以为这个问题实现O(logn loglog)位的空间。我们的技术,基于高斯过程,导致了数据流的许多其他新结果,包括:(1)第一个算法估计F2同时在流的所有点只使用O(logloglogn)位空间,改进了自然联合界。(2)一种用O(logloglogn)位空间估计流的可加性误差高达_√F2的l∞范数的方法,解决了IITK 2006列表中仅插入流的开放问题3。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exponential separation of communication and external information Proceedings of the forty-eighth annual ACM symposium on Theory of Computing Explicit two-source extractors and resilient functions Constant-rate coding for multiparty interactive communication is impossible Approximating connectivity domination in weighted bounded-genus graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1