首页 > 最新文献

Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems最新文献

英文 中文
AQuA: adaptive quality analytics AQuA:适应性质量分析
Wei Zhang, Martin Hirzel, D. Grove
Event-processing systems can support high-quality reactions to events by providing context to the event agents. When this context consists of a large amount of data, it helps to train an analytic model for it. In a continuously running solution, this model must be kept up-to-date, otherwise quality degrades. Unfortunately, ripple-through effects make training (whether from scratch or incremental) expensive. This paper tackles the problem of keeping training cost low and model quality high. We propose AQuA, a quality-directed adaptive analytics retraining framework. AQuA incrementally tracks model quality and only retrains when necessary. AQuA can identify both gradual and abrupt model drift. We implement several retraining strategies in AQuA, and find that a sliding-window strategy consistently outperforms the rest. AQuA is simple to implement over off-the-shelf big-data platforms. We evaluate AQuA on two real-world datasets and three widely-used machine learning algorithms, and show that AQuA effectively balances model quality against training effort.
事件处理系统可以通过向事件代理提供上下文来支持高质量的事件响应。当这个上下文包含大量数据时,它有助于为它训练一个分析模型。在持续运行的解决方案中,该模型必须保持最新,否则质量会下降。不幸的是,连锁效应使得培训(无论是从头开始还是增量)成本高昂。本文解决了保持低培训成本和高模型质量的问题。我们提出了AQuA,一个质量导向的适应性分析再培训框架。AQuA增量跟踪模型质量,只在必要时进行再培训。AQuA可以识别渐变和突变模式漂移。我们在AQuA中实施了几种再训练策略,并发现滑动窗口策略始终优于其他策略。AQuA在现成的大数据平台上很容易实现。我们在两个真实世界的数据集和三种广泛使用的机器学习算法上评估了AQuA,并表明AQuA有效地平衡了模型质量和训练努力。
{"title":"AQuA: adaptive quality analytics","authors":"Wei Zhang, Martin Hirzel, D. Grove","doi":"10.1145/2933267.2933309","DOIUrl":"https://doi.org/10.1145/2933267.2933309","url":null,"abstract":"Event-processing systems can support high-quality reactions to events by providing context to the event agents. When this context consists of a large amount of data, it helps to train an analytic model for it. In a continuously running solution, this model must be kept up-to-date, otherwise quality degrades. Unfortunately, ripple-through effects make training (whether from scratch or incremental) expensive. This paper tackles the problem of keeping training cost low and model quality high. We propose AQuA, a quality-directed adaptive analytics retraining framework. AQuA incrementally tracks model quality and only retrains when necessary. AQuA can identify both gradual and abrupt model drift. We implement several retraining strategies in AQuA, and find that a sliding-window strategy consistently outperforms the rest. AQuA is simple to implement over off-the-shelf big-data platforms. We evaluate AQuA on two real-world datasets and three widely-used machine learning algorithms, and show that AQuA effectively balances model quality against training effort.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126175467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Complex event processing for the non-expert with autoCEP: demo 复杂事件处理的非专家与autoCEP:演示
Raef Mousheimish, Y. Taher, K. Zeitouni
The inference mechanisms of CEP engines are completely guided by rules, which are specified manually by domain experts. We argue that this user-based rule specification is a limiting factor, as it requires the experts to have technical knowledge about the CEP language they want to use, it restricts the usage of CEP to merely the detection of straightforward situations, and it restrains its propagation to more advanced fields that require earliness, prediction and proactivity. Therefore, we introduce autoCEP as a data mining-based approach that automatically learns CEP rules from historical traces. autoCEP requires no technical knowledge from domain experts, and it also shows that the generated rules fit for prediction and proactive applications. Satisfactory results from evaluations on real data demonstrate the effectiveness of our framework.
CEP引擎的推理机制完全由规则指导,规则由领域专家手动指定。我们认为这种基于用户的规则规范是一个限制因素,因为它要求专家拥有他们想要使用的CEP语言的技术知识,它将CEP的使用限制为仅仅检测直接的情况,并且它限制了CEP向需要早期、预测和主动性的更高级领域的传播。因此,我们将autoCEP作为一种基于数据挖掘的方法引入,该方法可以从历史痕迹中自动学习CEP规则。autoCEP不需要领域专家的技术知识,它还表明生成的规则适合预测和主动应用程序。对实际数据的评价结果令人满意,证明了该框架的有效性。
{"title":"Complex event processing for the non-expert with autoCEP: demo","authors":"Raef Mousheimish, Y. Taher, K. Zeitouni","doi":"10.1145/2933267.2933296","DOIUrl":"https://doi.org/10.1145/2933267.2933296","url":null,"abstract":"The inference mechanisms of CEP engines are completely guided by rules, which are specified manually by domain experts. We argue that this user-based rule specification is a limiting factor, as it requires the experts to have technical knowledge about the CEP language they want to use, it restricts the usage of CEP to merely the detection of straightforward situations, and it restrains its propagation to more advanced fields that require earliness, prediction and proactivity. Therefore, we introduce autoCEP as a data mining-based approach that automatically learns CEP rules from historical traces. autoCEP requires no technical knowledge from domain experts, and it also shows that the generated rules fit for prediction and proactive applications. Satisfactory results from evaluations on real data demonstrate the effectiveness of our framework.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129066810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Energy efficient, context-aware cache coding for mobile information-centric networks 用于移动信息中心网络的节能、上下文感知缓存编码
Joshua Joy, Yu-Ting Yu, M. Gerla, Ashish Gehani, Hasnain Lakhani, Minyoung Kim
In a mobile, intermittently connected information-centric network (ICN), users download files either from the original source or from caches assembled during previous downloads. Network coding has helped to increase download robustness and overcome "missing coupon" delays. Prior work has also shown that network coding depletes energy resources much faster than no coding. Our contribution here is to make coding more efficient, and to detect when it is not necessary, in order to prolong the life of mobile handhelds. In the network coding context, Cache Coding (i.e., coding performed only on fully cached files) can prevent pollution attacks without significantly reducing diversity and performance with respect to unrestricted code mixing. Cache Coding introduces the first important means to reduce energy consumption by avoiding the extremely processor-intensive homomorphic code used in conventional unrestricted mixing networks. Our second contribution is to detect when Cache Coding is not required and disable it to save precious energy. The proposed Context-Aware Cache Coding (CACC) toggles between using Cache Coding and no coding based on the current network context (e.g., mobility, error rates, file size, etc). Our CACC implementation on Android devices demonstrates that the new scheme improves upon network coding's file delivery rate while keeping energy consumption in check.
在移动的、间歇连接的信息中心网络(ICN)中,用户要么从原始源下载文件,要么从以前下载期间组装的缓存中下载文件。网络编码有助于提高下载稳健性并克服“丢失优惠券”延迟。先前的研究也表明,网络编码比不编码消耗能量的速度要快得多。我们在这里的贡献是使编码更有效,并在不必要的时候检测,以延长移动手持设备的使用寿命。在网络编码环境中,缓存编码(即仅在完全缓存的文件上执行的编码)可以防止污染攻击,而不会显著降低不受限制的代码混合的多样性和性能。缓存编码引入了第一种重要的手段,通过避免在传统的无限制混合网络中使用的极其处理器密集型的同态编码来降低能量消耗。我们的第二个贡献是检测何时不需要缓存编码并禁用它以节省宝贵的能源。提议的上下文感知缓存编码(CACC)在使用缓存编码和不使用基于当前网络上下文的编码之间切换(例如,移动性、错误率、文件大小等)。我们在Android设备上的CACC实现表明,新方案在控制能耗的同时提高了网络编码的文件传输速率。
{"title":"Energy efficient, context-aware cache coding for mobile information-centric networks","authors":"Joshua Joy, Yu-Ting Yu, M. Gerla, Ashish Gehani, Hasnain Lakhani, Minyoung Kim","doi":"10.1145/2933267.2940322","DOIUrl":"https://doi.org/10.1145/2933267.2940322","url":null,"abstract":"In a mobile, intermittently connected information-centric network (ICN), users download files either from the original source or from caches assembled during previous downloads. Network coding has helped to increase download robustness and overcome \"missing coupon\" delays. Prior work has also shown that network coding depletes energy resources much faster than no coding. Our contribution here is to make coding more efficient, and to detect when it is not necessary, in order to prolong the life of mobile handhelds. In the network coding context, Cache Coding (i.e., coding performed only on fully cached files) can prevent pollution attacks without significantly reducing diversity and performance with respect to unrestricted code mixing. Cache Coding introduces the first important means to reduce energy consumption by avoiding the extremely processor-intensive homomorphic code used in conventional unrestricted mixing networks. Our second contribution is to detect when Cache Coding is not required and disable it to save precious energy. The proposed Context-Aware Cache Coding (CACC) toggles between using Cache Coding and no coding based on the current network context (e.g., mobility, error rates, file size, etc). Our CACC implementation on Android devices demonstrates that the new scheme improves upon network coding's file delivery rate while keeping energy consumption in check.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114777844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Shared dictionary compression in publish/subscribe systems 发布/订阅系统中的共享字典压缩
Christoph Doblander, Tanuj Ghinaiya, Kaiwen Zhang, H. Jacobsen
Publish/subscribe is known as a scalable and efficient data dissemination mechanism. Its efficiency comes from the optimized routing algorithms, yet few works exist on employing compression to save bandwidth, which is especially important in mobile environments. State of the art compression methods such as GZip or Deflate can be generally employed to compress messages. In this paper, we show how to reduce bandwidth even further by employing Shared Dictionary Compression (SDC) in pub/sub. However, SDC requires a dictionary to be generated and disseminated prior to compression, which introduces additional computational and bandwidth overhead. To support SDC, we propose a novel and lightweight protocol for pub/sub which employs a new class of brokers, called sampling brokers. Our solution generates, and disseminates dictionaries using the sampling brokers. Dictionary maintenance is performed regularly using an adaptive algorithm. The evaluation of our proposed design shows that it is possible to compensate for the introduced overhead and achieve significant bandwidth reduction over Deflate.
发布/订阅是一种可伸缩且高效的数据传播机制。它的效率主要来自于优化的路由算法,但利用压缩来节省带宽的研究很少,这在移动环境中尤为重要。通常可以使用最先进的压缩方法,如GZip或Deflate来压缩消息。在本文中,我们展示了如何通过在pub/sub中使用共享字典压缩(SDC)进一步减少带宽。然而,SDC需要在压缩之前生成和分发字典,这会带来额外的计算和带宽开销。为了支持SDC,我们提出了一种新颖的轻量级pub/sub协议,该协议采用了一类新的代理,称为采样代理。我们的解决方案使用抽样代理生成和传播字典。字典维护使用自适应算法定期执行。对我们提出的设计的评估表明,它可以补偿引入的开销,并在Deflate上实现显著的带宽减少。
{"title":"Shared dictionary compression in publish/subscribe systems","authors":"Christoph Doblander, Tanuj Ghinaiya, Kaiwen Zhang, H. Jacobsen","doi":"10.1145/2933267.2933308","DOIUrl":"https://doi.org/10.1145/2933267.2933308","url":null,"abstract":"Publish/subscribe is known as a scalable and efficient data dissemination mechanism. Its efficiency comes from the optimized routing algorithms, yet few works exist on employing compression to save bandwidth, which is especially important in mobile environments. State of the art compression methods such as GZip or Deflate can be generally employed to compress messages. In this paper, we show how to reduce bandwidth even further by employing Shared Dictionary Compression (SDC) in pub/sub. However, SDC requires a dictionary to be generated and disseminated prior to compression, which introduces additional computational and bandwidth overhead. To support SDC, we propose a novel and lightweight protocol for pub/sub which employs a new class of brokers, called sampling brokers. Our solution generates, and disseminates dictionaries using the sampling brokers. Dictionary maintenance is performed regularly using an adaptive algorithm. The evaluation of our proposed design shows that it is possible to compensate for the introduced overhead and achieve significant bandwidth reduction over Deflate.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128235054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
SPASS: scalable event stream processing leveraging sharing opportunities: poster SPASS:利用共享机会的可伸缩事件流处理:海报
M. Ray, Chuan Lei, Elke A. Rundensteiner
Complex Event Processing (CEP) offers high-performance event analytics in time-critical decision-making applications. Yet supporting high-performance event processing has become increasingly difficult due to the increasing size and complexity of event pattern workloads. In this work, we propose the SPASS framework that leverages time-based event correlations among queries for sharing computation tasks among sequence queries in a workload. We show the NP-hardness of our CEP pattern sharing problem by reducing it from the Minimum Substring Cover problem. The SPASS system finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound. Further, the SPASS system assures concurrent maintenance and reuse of sub-patterns in the shared pattern plan. Our experimental evaluation confirms that the SPASS framework achieves over 16-fold performance gain compared to the state-of-the-art solutions.
复杂事件处理(CEP)在时间关键型决策应用程序中提供高性能事件分析。然而,由于事件模式工作负载的规模和复杂性不断增加,支持高性能事件处理变得越来越困难。在这项工作中,我们提出了SPASS框架,该框架利用查询之间基于时间的事件相关性,在工作负载中的序列查询之间共享计算任务。我们通过简化最小子串覆盖问题来证明我们的CEP模式共享问题的np -硬度。SPASS系统在多项式时间内找到一个覆盖所有序列模式的共享模式计划,同时保证最优性边界。此外,SPASS系统确保共享模式计划中的子模式的并发维护和重用。我们的实验评估证实,与最先进的解决方案相比,SPASS框架实现了超过16倍的性能增益。
{"title":"SPASS: scalable event stream processing leveraging sharing opportunities: poster","authors":"M. Ray, Chuan Lei, Elke A. Rundensteiner","doi":"10.1145/2933267.2933288","DOIUrl":"https://doi.org/10.1145/2933267.2933288","url":null,"abstract":"Complex Event Processing (CEP) offers high-performance event analytics in time-critical decision-making applications. Yet supporting high-performance event processing has become increasingly difficult due to the increasing size and complexity of event pattern workloads. In this work, we propose the SPASS framework that leverages time-based event correlations among queries for sharing computation tasks among sequence queries in a workload. We show the NP-hardness of our CEP pattern sharing problem by reducing it from the Minimum Substring Cover problem. The SPASS system finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound. Further, the SPASS system assures concurrent maintenance and reuse of sub-patterns in the shared pattern plan. Our experimental evaluation confirms that the SPASS framework achieves over 16-fold performance gain compared to the state-of-the-art solutions.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependable distributed content-based publish/subscribe systems: doctoral symposium 可靠的基于内容的分布式发布/订阅系统:博士研讨会
P. Salehi, H. Jacobsen
Content-based publish/subscribe systems provide an efficient communication paradigm that allows decoupling of information producers and consumers across location and time. Distributed overlay-based publish/subscribe systems, while scalable, face many problems that hinders their applicability in scenarios requiring dependable communication. In this paper, we discuss three important dimensions of dependability in distributed content-based publish/subscribe systems, namely, availability, reliability and maintainability.
基于内容的发布/订阅系统提供了一种高效的通信范例,允许跨地点和时间分离信息生产者和消费者。基于覆盖的分布式发布/订阅系统虽然具有可伸缩性,但却面临许多问题,这些问题阻碍了它们在需要可靠通信的场景中的适用性。本文讨论了分布式基于内容的发布/订阅系统中可靠性的三个重要维度,即可用性、可靠性和可维护性。
{"title":"Dependable distributed content-based publish/subscribe systems: doctoral symposium","authors":"P. Salehi, H. Jacobsen","doi":"10.1145/2933267.2933433","DOIUrl":"https://doi.org/10.1145/2933267.2933433","url":null,"abstract":"Content-based publish/subscribe systems provide an efficient communication paradigm that allows decoupling of information producers and consumers across location and time. Distributed overlay-based publish/subscribe systems, while scalable, face many problems that hinders their applicability in scenarios requiring dependable communication. In this paper, we discuss three important dimensions of dependability in distributed content-based publish/subscribe systems, namely, availability, reliability and maintainability.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"36 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133107272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RxSpatial: a framework for real-time spatio-temporal operations: demo RxSpatial:一个用于实时时空操作的框架
Abdeltawab M. Hendawi, Youying Shi, H. Fattah, Jumana Karwa, Mohamed H. Ali
Existing commercial database systems provide spatial libraries that support functions on static non-moving spatial objects, e.g., points, linestrings and polygons. Examples of these spatial functions include intersection, distance, buffer, and convex hull computation. The RxSpatial, or Reactive Spatial, library provides the same functionality support in the context of moving objects, and addresses the spatial computation challenges over high-frequency, low-latency, real-time moving objects. This demo presents the RxSpatial library that enables developers to instantly compute spatio-temporal operations in an incremental and streaming fashion. The demo scenarios show the applicability of the library in two real-world applications: spatio-temporal social networks and family locators.
现有的商业数据库系统提供空间库,支持静态的非移动空间对象的功能,例如点、线串和多边形。这些空间函数的例子包括交集、距离、缓冲和凸包计算。RxSpatial或Reactive Spatial库在移动对象上下文中提供了相同的功能支持,并解决了高频、低延迟、实时移动对象的空间计算挑战。这个演示展示了RxSpatial库,它使开发人员能够以增量和流的方式即时计算时空操作。演示场景展示了该库在两个实际应用中的适用性:时空社交网络和家庭定位器。
{"title":"RxSpatial: a framework for real-time spatio-temporal operations: demo","authors":"Abdeltawab M. Hendawi, Youying Shi, H. Fattah, Jumana Karwa, Mohamed H. Ali","doi":"10.1145/2933267.2933293","DOIUrl":"https://doi.org/10.1145/2933267.2933293","url":null,"abstract":"Existing commercial database systems provide spatial libraries that support functions on static non-moving spatial objects, e.g., points, linestrings and polygons. Examples of these spatial functions include intersection, distance, buffer, and convex hull computation. The RxSpatial, or Reactive Spatial, library provides the same functionality support in the context of moving objects, and addresses the spatial computation challenges over high-frequency, low-latency, real-time moving objects. This demo presents the RxSpatial library that enables developers to instantly compute spatio-temporal operations in an incremental and streaming fashion. The demo scenarios show the applicability of the library in two real-world applications: spatio-temporal social networks and family locators.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129879291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium 使用工作量建模的分布式流处理工作流的主动扩展:博士研讨会
Thomas Cooper
In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.
近年来,分布式流处理系统(DSPS)如Apache Storm、Spark和Flink等领域有了显著的发展。这些系统允许对流数据的复杂查询分布在集群中的多个工作节点上。dsp通常提供工具来添加/删除资源,并利用云基础设施来扩展其操作。但是,这背后的决策通常留给这些系统的管理员。已经有一些研究集中在寻找跨集群的DSPS运营商的最佳运营商部署。然而,这些系统通常不会针对给定的服务水平协议(SLA)进行优化,即使进行了优化,也不会考虑传入的工作负载。据我们所知,为了维护sla,很少或根本没有针对传入工作负载主动扩展dsp的工作。本博士将专注于使用时间序列分析预测传入的工作负载。为了评估给定的预测工作负载是否会违反SLA,将使用排队论方法对DSPS工作流对传入工作负载的响应进行建模。目的是构建一个系统,该系统可以在dsp运行时使用端到端延迟和吞吐量等输出指标来调优这个排队理论模型的参数。最终的结果将是一个系统,它可以在潜在的SLA违规发生之前识别出来,并启动一个主动的扩展响应。最初,Apache Storm将被用作测试dsp,但预计在本博士期间开发的系统将适用于其他使用基于图形的流工作流程描述的dsp,例如Apache Spark和Flink。
{"title":"Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium","authors":"Thomas Cooper","doi":"10.1145/2933267.2933429","DOIUrl":"https://doi.org/10.1145/2933267.2933429","url":null,"abstract":"In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121004715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Benchmarking integration pattern implementations 对集成模式实现进行基准测试
Daniel Ritter, Norman May, Kai Sachs, S. Rinderle-Ma
The integration of a growing number of distributed, heterogeneous applications is one of the main challenges of enterprise data management. Through the advent of cloud and mobile application integration, higher volumes of messages have to be processed, compared to common enterprise computing scenarios, while guaranteeing high throughput. However, no previous study has analyzed the impact on message throughput for Enterprise Integration Patterns (EIPs) (e. g., channel creation, routing and transformation). Acknowledging this void, we propose EIPBench, a comprehensive micro-benchmark design for evaluating the message throughput of frequently implemented EIPs and message delivery semantics in productive cloud scenarios. For that, these scenarios are collected and described in a process-driven, TPC-C-like taxonomy, from which the most relevant patterns, message formats, and scale factors are derived as foundation for the benchmark. To prove its applicability, we describe an EIPBench reference implementation and discuss the results of its application to an open source integration system that implements the selected patterns.
集成越来越多的分布式异构应用程序是企业数据管理的主要挑战之一。通过云和移动应用程序集成的出现,与常见的企业计算场景相比,必须处理更大量的消息,同时保证高吞吐量。然而,之前没有研究分析过企业集成模式(eip)(例如,通道创建、路由和转换)对消息吞吐量的影响。认识到这一空白,我们提出了EIPBench,这是一个全面的微基准设计,用于评估生产云场景中频繁实现的eip和消息传递语义的消息吞吐量。为此,在流程驱动的、类似于tpc - c的分类法中收集和描述这些场景,从中派生出最相关的模式、消息格式和规模因子,作为基准测试的基础。为了证明其适用性,我们描述了一个EIPBench参考实现,并讨论了将其应用于实现所选模式的开源集成系统的结果。
{"title":"Benchmarking integration pattern implementations","authors":"Daniel Ritter, Norman May, Kai Sachs, S. Rinderle-Ma","doi":"10.1145/2933267.2933269","DOIUrl":"https://doi.org/10.1145/2933267.2933269","url":null,"abstract":"The integration of a growing number of distributed, heterogeneous applications is one of the main challenges of enterprise data management. Through the advent of cloud and mobile application integration, higher volumes of messages have to be processed, compared to common enterprise computing scenarios, while guaranteeing high throughput. However, no previous study has analyzed the impact on message throughput for Enterprise Integration Patterns (EIPs) (e. g., channel creation, routing and transformation). Acknowledging this void, we propose EIPBench, a comprehensive micro-benchmark design for evaluating the message throughput of frequently implemented EIPs and message delivery semantics in productive cloud scenarios. For that, these scenarios are collected and described in a process-driven, TPC-C-like taxonomy, from which the most relevant patterns, message formats, and scale factors are derived as foundation for the benchmark. To prove its applicability, we describe an EIPBench reference implementation and discuss the results of its application to an open source integration system that implements the selected patterns.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121549633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Highly-available content-based publish/subscribe via gossiping 高度可用的基于内容的发布/订阅
P. Salehi, Christoph Doblander, H. Jacobsen
Many publish/subscribe systems are based on a tree topology as their message dissemination overlay. However, in trees, even a single broker failure can cause delivery disruption. Hence, a repair mechanism is required, along with message retransmission to prevent message loss. During repair and recovery, the latency of message delivery can temporarily increase. To address this problem, we present an epidemic protocol to allow a content-based publish/subscribe system to keep delivering messages with low latency, while failed brokers are recovering. Using a broker similarity metric, which takes into account the content space and the overlay topology, we control and direct gossip messages around failed brokers. We compare our approach against a deterministic reliable publish/subscribe approach and an alternative epidemic approach. Based on our evaluations, we show that in our approach, the delivery ratio and latency of message deliveries are close to the deterministic approach, with up to 70% less message overhead than the alternative epidemic approach. Furthermore, our approach is able to provide a higher message delivery ratio than the deterministic alternative at high failure rates or when broker failures follow a non-uniform distribution.
许多发布/订阅系统基于树形拓扑作为其消息传播覆盖层。然而,在树中,即使单个代理失败也会导致交付中断。因此,需要一种修复机制以及消息重传来防止消息丢失。在修复和恢复期间,消息传递的延迟可能会暂时增加。为了解决这个问题,我们提出了一个流行协议,允许基于内容的发布/订阅系统在失败的代理恢复期间以低延迟持续传递消息。使用考虑了内容空间和覆盖拓扑的代理相似度度量,我们控制和引导围绕失败代理的八卦消息。我们将我们的方法与确定性可靠的发布/订阅方法和另一种流行方法进行比较。根据我们的评估,我们表明,在我们的方法中,消息传递的传递率和延迟接近确定性方法,与替代流行病方法相比,消息开销减少了70%。此外,在高故障率或代理故障遵循非均匀分布时,我们的方法能够提供比确定性替代方案更高的消息传递率。
{"title":"Highly-available content-based publish/subscribe via gossiping","authors":"P. Salehi, Christoph Doblander, H. Jacobsen","doi":"10.1145/2933267.2933303","DOIUrl":"https://doi.org/10.1145/2933267.2933303","url":null,"abstract":"Many publish/subscribe systems are based on a tree topology as their message dissemination overlay. However, in trees, even a single broker failure can cause delivery disruption. Hence, a repair mechanism is required, along with message retransmission to prevent message loss. During repair and recovery, the latency of message delivery can temporarily increase. To address this problem, we present an epidemic protocol to allow a content-based publish/subscribe system to keep delivering messages with low latency, while failed brokers are recovering. Using a broker similarity metric, which takes into account the content space and the overlay topology, we control and direct gossip messages around failed brokers. We compare our approach against a deterministic reliable publish/subscribe approach and an alternative epidemic approach. Based on our evaluations, we show that in our approach, the delivery ratio and latency of message deliveries are close to the deterministic approach, with up to 70% less message overhead than the alternative epidemic approach. Furthermore, our approach is able to provide a higher message delivery ratio than the deterministic alternative at high failure rates or when broker failures follow a non-uniform distribution.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128617545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1