QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements

Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou
{"title":"QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements","authors":"Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou","doi":"arxiv-2409.01749","DOIUrl":null,"url":null,"abstract":"The frequent elements problem, a key component in demanding stream-data\nanalytics, involves selecting elements whose occurrence exceeds a\nuser-specified threshold. Fast, memory-efficient $\\epsilon$-approximate\nsynopsis algorithms select all frequent elements but may overestimate them\ndepending on $\\epsilon$ (user-defined parameter). Evolving applications demand\nperformance only achievable by parallelization. However, algorithmic guarantees\nconcerning concurrent updates and queries have been overlooked. We propose\nQuery and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency\nguarantees. The design includes an implementation of the \\emph{Space-Saving}\nalgorithm supporting fast queries, implying minimal overlap with concurrent\nupdates. QPOPSS integrates this with the distribution of work and fine-grained\nsynchronization among threads, swiftly balancing high throughput, high\naccuracy, and low memory consumption. Our analysis, under various concurrency\nand data distribution conditions, shows space and approximation bounds. Our\nempirical evaluation relative to representative state-of-the-art methods\nreveals that QPOPSS's multi-threaded throughput scales linearly while\nmaintaining the highest accuracy, with orders of magnitude smaller memory\nfootprint.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient $\epsilon$-approximate synopsis algorithms select all frequent elements but may overestimate them depending on $\epsilon$ (user-defined parameter). Evolving applications demand performance only achievable by parallelization. However, algorithmic guarantees concerning concurrent updates and queries have been overlooked. We propose Query and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency guarantees. The design includes an implementation of the \emph{Space-Saving} algorithm supporting fast queries, implying minimal overlap with concurrent updates. QPOPSS integrates this with the distribution of work and fine-grained synchronization among threads, swiftly balancing high throughput, high accuracy, and low memory consumption. Our analysis, under various concurrency and data distribution conditions, shows space and approximation bounds. Our empirical evaluation relative to representative state-of-the-art methods reveals that QPOPSS's multi-threaded throughput scales linearly while maintaining the highest accuracy, with orders of magnitude smaller memory footprint.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
QPOPSS:优化查询和并行性,为查找频繁流元素节省空间
频繁元素问题是要求苛刻的流数据分析中的一个关键组成部分,涉及选择出现率超过用户指定阈值的元素。快速、内存效率高的 $/epsilon$-近似提要算法会选择所有频繁元素,但可能会高估它们,这取决于 $/epsilon$(用户定义的参数)。不断发展的应用对性能的要求只能通过并行化来实现。然而,有关并发更新和查询的算法保证一直被忽视。我们提出了查询和并行优化节省空间(QPOPSS),提供并发保证。该设计包括支持快速查询的emph{Space-Saving}算法的实现,这意味着与并发更新的重叠最小。QPOPSS 将其与线程间的工作分配和细粒度同步整合在一起,迅速平衡了高吞吐量、高精确度和低内存消耗。我们在各种并发和数据分布条件下进行的分析表明了空间和近似边界。与最先进的代表性方法相比,我们的实证评估结果表明,QPOPSS 的多线程吞吐量呈线性扩展,同时保持了最高精度,内存占用却小了几个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1