MimoSketch: A Framework for Frequency-Based Mining Tasks on Multiple Nodes With Sketches

IF 10.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-12-26 DOI:10.1109/TKDE.2024.3523034
Wenfei Wu;Yuchen Xu
{"title":"MimoSketch: A Framework for Frequency-Based Mining Tasks on Multiple Nodes With Sketches","authors":"Wenfei Wu;Yuchen Xu","doi":"10.1109/TKDE.2024.3523034","DOIUrl":null,"url":null,"abstract":"In distributed data stream mining, we abstract a MIMO scenario where a stream of <underline>m</u>ultiple <underline>i</u>tems is mined by <underline>m</u>ultiple n<underline>o</u>des. We design a framework named MimoSketch for the MIMO-specific scenario, which improves the fundamental mining tasks of item frequency estimation, item size distribution estimation, heavy hitter detection, heavy change detection, and entropy estimation. MimoSketch consists of an algorithm design and a policy to schedule items to nodes. MimoSketch's algorithm applies random counting to preserve a mathematically proven <italic>unbiasedness</i> property, which makes it friendly to the aggregate query on multiple nodes; its memory layout is <italic>dynamically</i> adaptive to the runtime item size distribution, which maximizes the estimation accuracy by storing more items. MimoSketch's scheduling policy balances items among nodes, avoiding nodes being overloaded or underloaded, which improves the overall mining accuracy. Our prototype and evaluation show that our algorithm can improve the accuracy of five typical mining tasks by an order of magnitude compared with the state-of-the-art solutions, and the scheduling policy further promotes the performance in MIMO scenarios.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 3","pages":"1311-1324"},"PeriodicalIF":10.4000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10816464/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In distributed data stream mining, we abstract a MIMO scenario where a stream of multiple items is mined by multiple nodes. We design a framework named MimoSketch for the MIMO-specific scenario, which improves the fundamental mining tasks of item frequency estimation, item size distribution estimation, heavy hitter detection, heavy change detection, and entropy estimation. MimoSketch consists of an algorithm design and a policy to schedule items to nodes. MimoSketch's algorithm applies random counting to preserve a mathematically proven unbiasedness property, which makes it friendly to the aggregate query on multiple nodes; its memory layout is dynamically adaptive to the runtime item size distribution, which maximizes the estimation accuracy by storing more items. MimoSketch's scheduling policy balances items among nodes, avoiding nodes being overloaded or underloaded, which improves the overall mining accuracy. Our prototype and evaluation show that our algorithm can improve the accuracy of five typical mining tasks by an order of magnitude compared with the state-of-the-art solutions, and the scheduling policy further promotes the performance in MIMO scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MimoSketch:一个基于频率的多节点挖掘任务框架
在分布式数据流挖掘中,我们抽象了一个由多个节点挖掘多个项目的流的MIMO场景。针对mimo特定场景,我们设计了一个名为MimoSketch的框架,改进了项目频率估计、项目大小分布估计、重拳检测、重变化检测和熵估计等基本挖掘任务。MimoSketch由一个算法设计和一个将项目调度到节点的策略组成。MimoSketch的算法采用随机计数来保持数学证明的无偏性,这使得它对多个节点上的聚合查询友好;它的内存布局动态适应运行时的项目大小分布,通过存储更多的项目来最大化估计精度。MimoSketch的调度策略平衡了节点之间的项目,避免了节点过载或欠载,提高了整体的挖掘精度。我们的原型和评估表明,与目前的解决方案相比,我们的算法可以将五个典型挖掘任务的精度提高一个数量级,并且调度策略进一步提高了MIMO场景下的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering 工程技术-工程:电子与电气
CiteScore
11.70
自引率
3.40%
发文量
515
审稿时长
6 months
期刊介绍: The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.
期刊最新文献
Moon: A Modality Conversion-Based Efficient Multivariate Time Series Anomaly Detection Win-Win Approaches for Cross Dynamic Task Assignment in Spatial Crowdsourcing Property-Induced Partitioning for Graph Pattern Queries on Distributed RDF Systems Locally Differentially Private Truth Discovery for Sparse Crowdsensing Learnable Game-Theoretic Policy Optimization for Data-Centric Self-Explanation Rationalization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1