插入流中1-重子的最优算法及相关问题

Arnab Bhattacharyya, P. Dey, David P. Woodruff
{"title":"插入流中1-重子的最优算法及相关问题","authors":"Arnab Bhattacharyya, P. Dey, David P. Woodruff","doi":"10.1145/3264427","DOIUrl":null,"url":null,"abstract":"We give the first optimal bounds for returning the ℓ1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in { 1, 2, … , n} and parameters 0 < ε < φ ⩽ 1, let fi denote the frequency of item i, i.e., the number of times item i occurs in the stream. With arbitrarily large constant probability, our algorithm returns all items i for which fi ⩾ φ m, returns no items j for which fj ⩽ (φ −ε)m, and returns approximations f˜i with |f˜i − fi| ⩽ ε m for each item i that it returns. Our algorithm uses O(ε−1 log φ −1 + φ −1 log n + log log m) bits of space, processes each stream update in O(1) worst-case time, and can report its output in time linear in the output size. We also prove a lower bound, which implies that our algorithm is optimal up to a constant factor in its space complexity. A modification of our algorithm can be used to estimate the maximum frequency up to an additive ε m error in the above amount of space, resolving Question 3 in the IITK 2006 Workshop on Algorithms for Data Streams for the case of ℓ1-heavy hitters. We also introduce several variants of the heavy hitters and maximum frequency problems, inspired by rank aggregation and voting schemes, and show how our techniques can be applied in such settings. Unlike the traditional heavy hitters problem, some of these variants look at comparisons between items rather than numerical values to determine the frequency of an item.","PeriodicalId":154047,"journal":{"name":"ACM Transactions on Algorithms (TALG)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"An Optimal Algorithm for ℓ1-Heavy Hitters in Insertion Streams and Related Problems\",\"authors\":\"Arnab Bhattacharyya, P. Dey, David P. Woodruff\",\"doi\":\"10.1145/3264427\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We give the first optimal bounds for returning the ℓ1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in { 1, 2, … , n} and parameters 0 < ε < φ ⩽ 1, let fi denote the frequency of item i, i.e., the number of times item i occurs in the stream. With arbitrarily large constant probability, our algorithm returns all items i for which fi ⩾ φ m, returns no items j for which fj ⩽ (φ −ε)m, and returns approximations f˜i with |f˜i − fi| ⩽ ε m for each item i that it returns. Our algorithm uses O(ε−1 log φ −1 + φ −1 log n + log log m) bits of space, processes each stream update in O(1) worst-case time, and can report its output in time linear in the output size. We also prove a lower bound, which implies that our algorithm is optimal up to a constant factor in its space complexity. A modification of our algorithm can be used to estimate the maximum frequency up to an additive ε m error in the above amount of space, resolving Question 3 in the IITK 2006 Workshop on Algorithms for Data Streams for the case of ℓ1-heavy hitters. We also introduce several variants of the heavy hitters and maximum frequency problems, inspired by rank aggregation and voting schemes, and show how our techniques can be applied in such settings. Unlike the traditional heavy hitters problem, some of these variants look at comparisons between items rather than numerical values to determine the frequency of an item.\",\"PeriodicalId\":154047,\"journal\":{\"name\":\"ACM Transactions on Algorithms (TALG)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Algorithms (TALG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3264427\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Algorithms (TALG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3264427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

我们给出了在插入数据流中返回1-重命中的第一个最优边界,以及它们的近似频率,结束了对这个问题的一长串工作。对于在{1,2,…,n}中有m个项目且参数0 < ε < φ≤1的流,设fi表示项目i出现的频率,即项目i在流中出现的次数。以任意大的常数概率,我们的算法返回fi小于φ m的所有项目i,不返回fj≤(φ−ε)m的项目j,并且为它返回的每个项目i返回带有|f ~ i−fi|≤ε m的近似f ~ i。我们的算法使用O(ε−1 log φ−1 + φ−1 log n + log log m)位空间,在O(1)最坏情况时间内处理每个流更新,并且可以在输出大小上呈时间线性报告其输出。我们还证明了一个下界,这意味着我们的算法是最优的,直到一个常数因子的空间复杂度。我们的算法的一个修改可以用来估计在上述空间量的最大频率到一个可加的ε m误差,解决了IITK 2006年的数据流算法研讨会上的问题3。受排名聚合和投票方案的启发,我们还介绍了重磅炸弹和最大频率问题的几个变体,并展示了我们的技术如何应用于此类设置。与传统的重磅问题不同,这些变体中的一些着眼于项目之间的比较,而不是数值,以确定一个项目的频率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Optimal Algorithm for ℓ1-Heavy Hitters in Insertion Streams and Related Problems
We give the first optimal bounds for returning the ℓ1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in { 1, 2, … , n} and parameters 0 < ε < φ ⩽ 1, let fi denote the frequency of item i, i.e., the number of times item i occurs in the stream. With arbitrarily large constant probability, our algorithm returns all items i for which fi ⩾ φ m, returns no items j for which fj ⩽ (φ −ε)m, and returns approximations f˜i with |f˜i − fi| ⩽ ε m for each item i that it returns. Our algorithm uses O(ε−1 log φ −1 + φ −1 log n + log log m) bits of space, processes each stream update in O(1) worst-case time, and can report its output in time linear in the output size. We also prove a lower bound, which implies that our algorithm is optimal up to a constant factor in its space complexity. A modification of our algorithm can be used to estimate the maximum frequency up to an additive ε m error in the above amount of space, resolving Question 3 in the IITK 2006 Workshop on Algorithms for Data Streams for the case of ℓ1-heavy hitters. We also introduce several variants of the heavy hitters and maximum frequency problems, inspired by rank aggregation and voting schemes, and show how our techniques can be applied in such settings. Unlike the traditional heavy hitters problem, some of these variants look at comparisons between items rather than numerical values to determine the frequency of an item.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generic Techniques for Building Top-k Structures Deterministic Leader Election in Anonymous Radio Networks A Learned Approach to Design Compressed Rank/Select Data Structures k-apices of Minor-closed Graph Classes. II. Parameterized Algorithms Fully Dynamic (Δ +1)-Coloring in O(1) Update Time
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1