对数据流的聚合进行完全分散的计算

StreamKDD '10 Pub Date : 2011-03-31 DOI:10.1145/1833280.1833281

L. Becchetti, Ilaria Bordino, S. Leonardi, A. Rosén

{"title":"对数据流的聚合进行完全分散的计算","authors":"L. Becchetti, Ilaria Bordino, S. Leonardi, A. Rosén","doi":"10.1145/1833280.1833281","DOIUrl":null,"url":null,"abstract":"In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets.\n The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion.\n In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node.\n We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We finally present experimental analysis providing evidence for the efficiency and accuracy of our algorithms on realistic simulated scenarios.","PeriodicalId":383372,"journal":{"name":"StreamKDD '10","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Fully decentralized computation of aggregates over data streams\",\"authors\":\"L. Becchetti, Ilaria Bordino, S. Leonardi, A. Rosén\",\"doi\":\"10.1145/1833280.1833281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets.\\n The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion.\\n In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node.\\n We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We finally present experimental analysis providing evidence for the efficiency and accuracy of our algorithms on realistic simulated scenarios.\",\"PeriodicalId\":383372,\"journal\":{\"name\":\"StreamKDD '10\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"StreamKDD '10\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1833280.1833281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"StreamKDD '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1833280.1833281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在一些新兴的应用中，数据是在几个分布的观测点以大量流的方式收集的。一个基本且具有挑战性的任务是允许每个节点通过对其附近观察到的流发出连续的聚合查询来监视感兴趣的邻域。这类算法在本质上是完全去中心化和扩散性的:在低容量设备或海量数据集存在的网络中，在网络的几个中心节点上收集所有数据是不可行的。设计扩散算法的主要困难是处理重复检测。这些都是由于在网络的几个节点上观察到相同的事件和/或在多条扩散路径上接收到相同的汇总信息而产生的。在本文中，我们考虑了完全分散的算法，这些算法在上述场景中回答关于不同事件的数量、事件总数和第二次频率矩的局部连续聚合查询。所提出的算法在最坏情况下或在实际分布下使用每个节点的次线性空间。我们还提出了在观察到新事件时将更新聚合所需的通信最小化的策略。最后，我们给出了实验分析，为我们的算法在现实模拟场景下的效率和准确性提供了证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fully decentralized computation of aggregates over data streams

In several emerging applications, data is collected in massive streams at several distributed points of observation. A basic and challenging task is to allow every node to monitor a neighbourhood of interest by issuing continuous aggregate queries on the streams observed in its vicinity. This class of algorithms is fully decentralized and diffusive in nature: collecting all data at few central nodes of the network is unfeasible in networks of low capability devices or in the presence of massive data sets. The main difficulty in designing diffusive algorithms is to cope with duplicate detections. These arise both from the observation of the same event at several nodes of the network and/or receipt of the same aggregated information along multiple paths of diffusion. In this paper, we consider fully decentralized algorithms that answer locally continuous aggregate queries on the number of distinct events, total number of events and the second frequency moment in the scenario outlined above. The proposed algorithms use in the worst case or on realistic distributions sublinear space at every node. We also propose strategies that minimize the communication needed to update the aggregates when new events are observed. We finally present experimental analysis providing evidence for the efficiency and accuracy of our algorithms on realistic simulated scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

StreamKDD '10

自引率

0.00%

发文量

期刊最新文献

Fully decentralized computation of aggregates over data streams CALDS: context-aware learning from data streams Towards subspace clustering on dynamic data: an incremental version of PreDeCon Research issues in mining multiple data streams Conformal prediction for distribution-independent anomaly detection in streaming vessel data