Group Mutual Exclusion to Scale Distributed Stream Processing Pipelines

2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC) Pub Date : 2020-12-01 DOI:10.1109/UCC48980.2020.00043

Mehdi Belkhiria, M. Bertier, Cédric Tedeschi

{"title":"Group Mutual Exclusion to Scale Distributed Stream Processing Pipelines","authors":"Mehdi Belkhiria, M. Bertier, Cédric Tedeschi","doi":"10.1109/UCC48980.2020.00043","DOIUrl":null,"url":null,"abstract":"Stream Processing has become the de facto standard way of supporting real-time data analytics. Stream Processing applications are typically shaped as pipelines of operators, each record of the stream traversing all the operators of the graph. The placement of these operators on nodes of the platform can evolve through time according to different parameters such as the velocity of the input stream and the capacity of nodes. Such an adaptation calls for mechanisms such as dynamic operator scaling and migration. With the advent of Fog Computing, gathering multiple computationally-limited geographically-distributed resources, these mechanisms need to be decentralized, as a central coordinator orchestrating these actions is not a scalable solution any more.In a fully decentralized vision, each node hosts part of the pipeline. Each node is responsible for the scaling of the operators it runs. More precisely speaking, nodes trigger new instances of the operators they runs or shut some of them down. The number of replicas of each operator evolving independently, there is a need to maintain the connections between nodes hosting neighbouring operators in the pipeline. One issue is that, if all these operators can scale in or out dynamically, maintaining a consistent view of their neighbours becomes difficult, calling for synchronization mechanisms to ensure it, to avoid routing inconsistencies and data loss.In this paper, we show that this synchronization problem translate into a particular Group Mutual Exclusion (GME) problem where a group comprises all instances of a given operator of the pipeline and where conflicting groups are those hosting neighbouring operators in the pipeline. The specificity of our problem is that groups are fixed and that each group is in conflict with only one other groups at a time. Based on these constraints, we formulate a new GME algorithm whose message complexity is reduced when compared to algorithms of the literature, while being able to ensure a high level of concurrent occupancy (the number of processes of the same group in the critical section (the scaling mechanism) at the same time.","PeriodicalId":125849,"journal":{"name":"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC48980.2020.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Stream Processing has become the de facto standard way of supporting real-time data analytics. Stream Processing applications are typically shaped as pipelines of operators, each record of the stream traversing all the operators of the graph. The placement of these operators on nodes of the platform can evolve through time according to different parameters such as the velocity of the input stream and the capacity of nodes. Such an adaptation calls for mechanisms such as dynamic operator scaling and migration. With the advent of Fog Computing, gathering multiple computationally-limited geographically-distributed resources, these mechanisms need to be decentralized, as a central coordinator orchestrating these actions is not a scalable solution any more.In a fully decentralized vision, each node hosts part of the pipeline. Each node is responsible for the scaling of the operators it runs. More precisely speaking, nodes trigger new instances of the operators they runs or shut some of them down. The number of replicas of each operator evolving independently, there is a need to maintain the connections between nodes hosting neighbouring operators in the pipeline. One issue is that, if all these operators can scale in or out dynamically, maintaining a consistent view of their neighbours becomes difficult, calling for synchronization mechanisms to ensure it, to avoid routing inconsistencies and data loss.In this paper, we show that this synchronization problem translate into a particular Group Mutual Exclusion (GME) problem where a group comprises all instances of a given operator of the pipeline and where conflicting groups are those hosting neighbouring operators in the pipeline. The specificity of our problem is that groups are fixed and that each group is in conflict with only one other groups at a time. Based on these constraints, we formulate a new GME algorithm whose message complexity is reduced when compared to algorithms of the literature, while being able to ensure a high level of concurrent occupancy (the number of processes of the same group in the critical section (the scaling mechanism) at the same time.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

组互斥扩展分布式流处理管道

流处理已经成为支持实时数据分析的事实上的标准方式。流处理应用程序通常被塑造为操作符的管道，流的每个记录遍历图的所有操作符。这些操作符在平台节点上的位置可以根据输入流的速度和节点的容量等不同参数随时间而变化。这种适应需要动态算子缩放和迁移等机制。随着雾计算的出现，收集多个计算有限的地理分布资源，这些机制需要去中心化，因为协调这些操作的中央协调器不再是可扩展的解决方案。在完全去中心化的视图中，每个节点承载管道的一部分。每个节点负责其运行的操作符的缩放。更准确地说，节点触发它们运行的操作符的新实例，或者关闭其中一些操作符。由于每个操作符的副本数量独立发展，因此需要维护管道中承载相邻操作符的节点之间的连接。一个问题是，如果所有这些操作符都可以动态伸缩，那么维护其邻居的一致视图将变得困难，需要同步机制来确保这一点，以避免路由不一致和数据丢失。在本文中，我们证明了这个同步问题转化为一个特定的组互斥(GME)问题，其中一个组包含管道中给定操作符的所有实例，其中冲突组是管道中承载相邻操作符的组。我们的问题的特殊性在于群体是固定的，每个群体一次只与另一个群体发生冲突。基于这些约束，我们提出了一种新的GME算法，该算法与文献中的算法相比，降低了消息复杂度，同时能够保证高水平的并发占用(临界区域内同一组的进程数(扩展机制))。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)

自引率

0.00%

发文量

期刊最新文献

Blockchain Mobility Solution for Charging Transactions of Electrical Vehicles Open-source Serverless Architectures: an Evaluation of Apache OpenWhisk Explaining probabilistic Artificial Intelligence (AI) models by discretizing Deep Neural Networks Message from the B2D2LM 2020 Workshop Chairs Dynamic Network Slicing in Fog Computing for Mobile Users in MobFogSim