Near-optimal approximate membership query over time-decaying windows

Yang Liu, Wenji Chen, Y. Guan
{"title":"Near-optimal approximate membership query over time-decaying windows","authors":"Yang Liu, Wenji Chen, Y. Guan","doi":"10.1109/INFCOM.2013.6566939","DOIUrl":null,"url":null,"abstract":"There has been a long history of finding a spaceefficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set A of n items and an additional item x from the same universe U of a size m ≫ n, we want to distinguish whether x ∈ A or not, using small (limited) space. The solutions for the membership query are needed for many network applications, such as cache directory, load-balancing, security, etc. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1+ o(1))n log 1/δ bits, which only allows for a small false positive δ but no false negative. However, existing optimal algorithms are not practical for many Internet applications, e.g., social network services, peer-to-peer systems, network traffic monitoring, etc. They are too spaceand time-expensive due to the frequent changes in the set A, because all items are needed to recompute the optimal data structure for each change using a linear running time. In this paper, we propose a novel data structure to support the approximate membership query in the time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size w ≤ n. Our data structure only requires O(n(log 1/δ+logn)) bits and O(1) running time. We also prove a non-trivial space lower bound, i.e. (n - δm) log(n - δm) bits, which guarantees that our data structure is near-optimal. Our data structure has been evaluated using both synthetic and real data sets.","PeriodicalId":206346,"journal":{"name":"2013 Proceedings IEEE INFOCOM","volume":"139 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Proceedings IEEE INFOCOM","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFCOM.2013.6566939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

There has been a long history of finding a spaceefficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set A of n items and an additional item x from the same universe U of a size m ≫ n, we want to distinguish whether x ∈ A or not, using small (limited) space. The solutions for the membership query are needed for many network applications, such as cache directory, load-balancing, security, etc. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1+ o(1))n log 1/δ bits, which only allows for a small false positive δ but no false negative. However, existing optimal algorithms are not practical for many Internet applications, e.g., social network services, peer-to-peer systems, network traffic monitoring, etc. They are too spaceand time-expensive due to the frequent changes in the set A, because all items are needed to recompute the optimal data structure for each change using a linear running time. In this paper, we propose a novel data structure to support the approximate membership query in the time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size w ≤ n. Our data structure only requires O(n(log 1/δ+logn)) bits and O(1) running time. We also prove a non-trivial space lower bound, i.e. (n - δm) log(n - δm) bits, which guarantees that our data structure is near-optimal. Our data structure has been evaluated using both synthetic and real data sets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在时间衰减窗口上的近似最优成员查询
从20世纪70年代Bloom的工作开始,人们一直在寻找一种空间高效的数据结构来支持近似成员查询。给定一个有n个项目的集合a和一个附加的项目x,它们来自大小为m比n的同一个宇宙U,我们想用很小的(有限的)空间来区分x是否∈a。许多网络应用都需要成员查询的解决方案,如缓存目录、负载平衡、安全性等。如果A是静态的,存在最优算法来找到一个随机数据结构来表示A,只使用(1+ o(1))n log 1/δ位,这只允许一个小的假正δ,但没有假负。然而,现有的最优算法并不适用于许多互联网应用,例如社交网络服务、点对点系统、网络流量监控等。由于集合A中的频繁更改,它们的空间和时间开销太大,因为所有项都需要使用线性运行时间为每次更改重新计算最佳数据结构。本文提出了一种新的数据结构来支持时间衰减窗口模型中的近似隶属度查询。在这个模型中,条目在数据流上一个接一个地插入,我们想要确定一个条目是否在任何给定窗口大小w≤n的最近w项中。我们的数据结构只需要O(n(log 1/δ+logn))位和O(1)运行时间。我们还证明了一个非平凡空间下界,即(n - δm) log(n - δm)位,这保证了我们的数据结构是接近最优的。我们的数据结构已经使用合成数据集和真实数据集进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
VoteTrust: Leveraging friend invitation graph to defend against social network Sybils Groupon in the Air: A three-stage auction framework for Spectrum Group-buying Into the Moana1 — Hypergraph-based network layer indirection Prometheus: Privacy-aware data retrieval on hybrid cloud Adaptive device-free passive localization coping with dynamic target speed
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1