{"title":"Optimizing Windowed Aggregation over Geo-Distributed Data Streams","authors":"Hooman Peiro Sajjad, Y. Liu, Vladimir Vlassov","doi":"10.1109/EDGE.2018.00012","DOIUrl":null,"url":null,"abstract":"Real-time data analytics is essential since more and more applications require online decision making in a timely manner. However, efficient analysis of geo-distributed data streams is challenging. This is because data needs to be collected from all edge data centers, which aggregate data from local sources, in order to process most of the analytic tasks. Thus, most of the time edge data centers need to transfer data to a central data center over a wide area network, which is expensive. In this paper, we advocate for a coordinated approach of edge data centers in order to handle these analytic tasks efficiently and hence, reducing the communication cost among data centers. We focus on the windowed aggregation of data streams, which has been widely used in stream analytics. In general, aggregation of data streams among edge data centers in the same region reduces the amount of data that needs to be sent over cross-region communication links. Based on state-of-the-art research, we leverage intra-region links and design a low-overhead coordination algorithm that optimizes communication cost for data aggregation. Our algorithm has been evaluated using synthetic and Big Data Benchmark datasets. The evaluation results show that our algorithm reduces the bandwidth cost up to ~6x, as compared to the state-of-the-art solution.","PeriodicalId":396887,"journal":{"name":"2018 IEEE International Conference on Edge Computing (EDGE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Edge Computing (EDGE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDGE.2018.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Real-time data analytics is essential since more and more applications require online decision making in a timely manner. However, efficient analysis of geo-distributed data streams is challenging. This is because data needs to be collected from all edge data centers, which aggregate data from local sources, in order to process most of the analytic tasks. Thus, most of the time edge data centers need to transfer data to a central data center over a wide area network, which is expensive. In this paper, we advocate for a coordinated approach of edge data centers in order to handle these analytic tasks efficiently and hence, reducing the communication cost among data centers. We focus on the windowed aggregation of data streams, which has been widely used in stream analytics. In general, aggregation of data streams among edge data centers in the same region reduces the amount of data that needs to be sent over cross-region communication links. Based on state-of-the-art research, we leverage intra-region links and design a low-overhead coordination algorithm that optimizes communication cost for data aggregation. Our algorithm has been evaluated using synthetic and Big Data Benchmark datasets. The evaluation results show that our algorithm reduces the bandwidth cost up to ~6x, as compared to the state-of-the-art solution.