Event-processing systems can support high-quality reactions to events by providing context to the event agents. When this context consists of a large amount of data, it helps to train an analytic model for it. In a continuously running solution, this model must be kept up-to-date, otherwise quality degrades. Unfortunately, ripple-through effects make training (whether from scratch or incremental) expensive. This paper tackles the problem of keeping training cost low and model quality high. We propose AQuA, a quality-directed adaptive analytics retraining framework. AQuA incrementally tracks model quality and only retrains when necessary. AQuA can identify both gradual and abrupt model drift. We implement several retraining strategies in AQuA, and find that a sliding-window strategy consistently outperforms the rest. AQuA is simple to implement over off-the-shelf big-data platforms. We evaluate AQuA on two real-world datasets and three widely-used machine learning algorithms, and show that AQuA effectively balances model quality against training effort.
{"title":"AQuA: adaptive quality analytics","authors":"Wei Zhang, Martin Hirzel, D. Grove","doi":"10.1145/2933267.2933309","DOIUrl":"https://doi.org/10.1145/2933267.2933309","url":null,"abstract":"Event-processing systems can support high-quality reactions to events by providing context to the event agents. When this context consists of a large amount of data, it helps to train an analytic model for it. In a continuously running solution, this model must be kept up-to-date, otherwise quality degrades. Unfortunately, ripple-through effects make training (whether from scratch or incremental) expensive. This paper tackles the problem of keeping training cost low and model quality high. We propose AQuA, a quality-directed adaptive analytics retraining framework. AQuA incrementally tracks model quality and only retrains when necessary. AQuA can identify both gradual and abrupt model drift. We implement several retraining strategies in AQuA, and find that a sliding-window strategy consistently outperforms the rest. AQuA is simple to implement over off-the-shelf big-data platforms. We evaluate AQuA on two real-world datasets and three widely-used machine learning algorithms, and show that AQuA effectively balances model quality against training effort.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126175467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The inference mechanisms of CEP engines are completely guided by rules, which are specified manually by domain experts. We argue that this user-based rule specification is a limiting factor, as it requires the experts to have technical knowledge about the CEP language they want to use, it restricts the usage of CEP to merely the detection of straightforward situations, and it restrains its propagation to more advanced fields that require earliness, prediction and proactivity. Therefore, we introduce autoCEP as a data mining-based approach that automatically learns CEP rules from historical traces. autoCEP requires no technical knowledge from domain experts, and it also shows that the generated rules fit for prediction and proactive applications. Satisfactory results from evaluations on real data demonstrate the effectiveness of our framework.
{"title":"Complex event processing for the non-expert with autoCEP: demo","authors":"Raef Mousheimish, Y. Taher, K. Zeitouni","doi":"10.1145/2933267.2933296","DOIUrl":"https://doi.org/10.1145/2933267.2933296","url":null,"abstract":"The inference mechanisms of CEP engines are completely guided by rules, which are specified manually by domain experts. We argue that this user-based rule specification is a limiting factor, as it requires the experts to have technical knowledge about the CEP language they want to use, it restricts the usage of CEP to merely the detection of straightforward situations, and it restrains its propagation to more advanced fields that require earliness, prediction and proactivity. Therefore, we introduce autoCEP as a data mining-based approach that automatically learns CEP rules from historical traces. autoCEP requires no technical knowledge from domain experts, and it also shows that the generated rules fit for prediction and proactive applications. Satisfactory results from evaluations on real data demonstrate the effectiveness of our framework.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129066810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Joy, Yu-Ting Yu, M. Gerla, Ashish Gehani, Hasnain Lakhani, Minyoung Kim
In a mobile, intermittently connected information-centric network (ICN), users download files either from the original source or from caches assembled during previous downloads. Network coding has helped to increase download robustness and overcome "missing coupon" delays. Prior work has also shown that network coding depletes energy resources much faster than no coding. Our contribution here is to make coding more efficient, and to detect when it is not necessary, in order to prolong the life of mobile handhelds. In the network coding context, Cache Coding (i.e., coding performed only on fully cached files) can prevent pollution attacks without significantly reducing diversity and performance with respect to unrestricted code mixing. Cache Coding introduces the first important means to reduce energy consumption by avoiding the extremely processor-intensive homomorphic code used in conventional unrestricted mixing networks. Our second contribution is to detect when Cache Coding is not required and disable it to save precious energy. The proposed Context-Aware Cache Coding (CACC) toggles between using Cache Coding and no coding based on the current network context (e.g., mobility, error rates, file size, etc). Our CACC implementation on Android devices demonstrates that the new scheme improves upon network coding's file delivery rate while keeping energy consumption in check.
{"title":"Energy efficient, context-aware cache coding for mobile information-centric networks","authors":"Joshua Joy, Yu-Ting Yu, M. Gerla, Ashish Gehani, Hasnain Lakhani, Minyoung Kim","doi":"10.1145/2933267.2940322","DOIUrl":"https://doi.org/10.1145/2933267.2940322","url":null,"abstract":"In a mobile, intermittently connected information-centric network (ICN), users download files either from the original source or from caches assembled during previous downloads. Network coding has helped to increase download robustness and overcome \"missing coupon\" delays. Prior work has also shown that network coding depletes energy resources much faster than no coding. Our contribution here is to make coding more efficient, and to detect when it is not necessary, in order to prolong the life of mobile handhelds. In the network coding context, Cache Coding (i.e., coding performed only on fully cached files) can prevent pollution attacks without significantly reducing diversity and performance with respect to unrestricted code mixing. Cache Coding introduces the first important means to reduce energy consumption by avoiding the extremely processor-intensive homomorphic code used in conventional unrestricted mixing networks. Our second contribution is to detect when Cache Coding is not required and disable it to save precious energy. The proposed Context-Aware Cache Coding (CACC) toggles between using Cache Coding and no coding based on the current network context (e.g., mobility, error rates, file size, etc). Our CACC implementation on Android devices demonstrates that the new scheme improves upon network coding's file delivery rate while keeping energy consumption in check.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114777844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Doblander, Tanuj Ghinaiya, Kaiwen Zhang, H. Jacobsen
Publish/subscribe is known as a scalable and efficient data dissemination mechanism. Its efficiency comes from the optimized routing algorithms, yet few works exist on employing compression to save bandwidth, which is especially important in mobile environments. State of the art compression methods such as GZip or Deflate can be generally employed to compress messages. In this paper, we show how to reduce bandwidth even further by employing Shared Dictionary Compression (SDC) in pub/sub. However, SDC requires a dictionary to be generated and disseminated prior to compression, which introduces additional computational and bandwidth overhead. To support SDC, we propose a novel and lightweight protocol for pub/sub which employs a new class of brokers, called sampling brokers. Our solution generates, and disseminates dictionaries using the sampling brokers. Dictionary maintenance is performed regularly using an adaptive algorithm. The evaluation of our proposed design shows that it is possible to compensate for the introduced overhead and achieve significant bandwidth reduction over Deflate.
{"title":"Shared dictionary compression in publish/subscribe systems","authors":"Christoph Doblander, Tanuj Ghinaiya, Kaiwen Zhang, H. Jacobsen","doi":"10.1145/2933267.2933308","DOIUrl":"https://doi.org/10.1145/2933267.2933308","url":null,"abstract":"Publish/subscribe is known as a scalable and efficient data dissemination mechanism. Its efficiency comes from the optimized routing algorithms, yet few works exist on employing compression to save bandwidth, which is especially important in mobile environments. State of the art compression methods such as GZip or Deflate can be generally employed to compress messages. In this paper, we show how to reduce bandwidth even further by employing Shared Dictionary Compression (SDC) in pub/sub. However, SDC requires a dictionary to be generated and disseminated prior to compression, which introduces additional computational and bandwidth overhead. To support SDC, we propose a novel and lightweight protocol for pub/sub which employs a new class of brokers, called sampling brokers. Our solution generates, and disseminates dictionaries using the sampling brokers. Dictionary maintenance is performed regularly using an adaptive algorithm. The evaluation of our proposed design shows that it is possible to compensate for the introduced overhead and achieve significant bandwidth reduction over Deflate.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128235054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Complex Event Processing (CEP) offers high-performance event analytics in time-critical decision-making applications. Yet supporting high-performance event processing has become increasingly difficult due to the increasing size and complexity of event pattern workloads. In this work, we propose the SPASS framework that leverages time-based event correlations among queries for sharing computation tasks among sequence queries in a workload. We show the NP-hardness of our CEP pattern sharing problem by reducing it from the Minimum Substring Cover problem. The SPASS system finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound. Further, the SPASS system assures concurrent maintenance and reuse of sub-patterns in the shared pattern plan. Our experimental evaluation confirms that the SPASS framework achieves over 16-fold performance gain compared to the state-of-the-art solutions.
{"title":"SPASS: scalable event stream processing leveraging sharing opportunities: poster","authors":"M. Ray, Chuan Lei, Elke A. Rundensteiner","doi":"10.1145/2933267.2933288","DOIUrl":"https://doi.org/10.1145/2933267.2933288","url":null,"abstract":"Complex Event Processing (CEP) offers high-performance event analytics in time-critical decision-making applications. Yet supporting high-performance event processing has become increasingly difficult due to the increasing size and complexity of event pattern workloads. In this work, we propose the SPASS framework that leverages time-based event correlations among queries for sharing computation tasks among sequence queries in a workload. We show the NP-hardness of our CEP pattern sharing problem by reducing it from the Minimum Substring Cover problem. The SPASS system finds a shared pattern plan in polynomial-time covering all sequence patterns while still guaranteeing an optimality bound. Further, the SPASS system assures concurrent maintenance and reuse of sub-patterns in the shared pattern plan. Our experimental evaluation confirms that the SPASS framework achieves over 16-fold performance gain compared to the state-of-the-art solutions.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Content-based publish/subscribe systems provide an efficient communication paradigm that allows decoupling of information producers and consumers across location and time. Distributed overlay-based publish/subscribe systems, while scalable, face many problems that hinders their applicability in scenarios requiring dependable communication. In this paper, we discuss three important dimensions of dependability in distributed content-based publish/subscribe systems, namely, availability, reliability and maintainability.
{"title":"Dependable distributed content-based publish/subscribe systems: doctoral symposium","authors":"P. Salehi, H. Jacobsen","doi":"10.1145/2933267.2933433","DOIUrl":"https://doi.org/10.1145/2933267.2933433","url":null,"abstract":"Content-based publish/subscribe systems provide an efficient communication paradigm that allows decoupling of information producers and consumers across location and time. Distributed overlay-based publish/subscribe systems, while scalable, face many problems that hinders their applicability in scenarios requiring dependable communication. In this paper, we discuss three important dimensions of dependability in distributed content-based publish/subscribe systems, namely, availability, reliability and maintainability.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"36 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133107272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdeltawab M. Hendawi, Youying Shi, H. Fattah, Jumana Karwa, Mohamed H. Ali
Existing commercial database systems provide spatial libraries that support functions on static non-moving spatial objects, e.g., points, linestrings and polygons. Examples of these spatial functions include intersection, distance, buffer, and convex hull computation. The RxSpatial, or Reactive Spatial, library provides the same functionality support in the context of moving objects, and addresses the spatial computation challenges over high-frequency, low-latency, real-time moving objects. This demo presents the RxSpatial library that enables developers to instantly compute spatio-temporal operations in an incremental and streaming fashion. The demo scenarios show the applicability of the library in two real-world applications: spatio-temporal social networks and family locators.
{"title":"RxSpatial: a framework for real-time spatio-temporal operations: demo","authors":"Abdeltawab M. Hendawi, Youying Shi, H. Fattah, Jumana Karwa, Mohamed H. Ali","doi":"10.1145/2933267.2933293","DOIUrl":"https://doi.org/10.1145/2933267.2933293","url":null,"abstract":"Existing commercial database systems provide spatial libraries that support functions on static non-moving spatial objects, e.g., points, linestrings and polygons. Examples of these spatial functions include intersection, distance, buffer, and convex hull computation. The RxSpatial, or Reactive Spatial, library provides the same functionality support in the context of moving objects, and addresses the spatial computation challenges over high-frequency, low-latency, real-time moving objects. This demo presents the RxSpatial library that enables developers to instantly compute spatio-temporal operations in an incremental and streaming fashion. The demo scenarios show the applicability of the library in two real-world applications: spatio-temporal social networks and family locators.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129879291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.
{"title":"Proactive scaling of distributed stream processing work flows using workload modelling: doctoral symposium","authors":"Thomas Cooper","doi":"10.1145/2933267.2933429","DOIUrl":"https://doi.org/10.1145/2933267.2933429","url":null,"abstract":"In recent years there has been significant development in the area of distributed stream processing systems (DSPS) such as Apache Storm, Spark, and Flink. These systems allow complex queries on streaming data to be distributed across multiple worker nodes in a cluster. DSPS often provide the tools to add/remove resources and take advantage of cloud infrastructure to scale their operation. However, the decisions behind this are generally left to the administrators of these systems. There have been several studies focused on finding optimal operator deployments of DSPS operators across a cluster. However, these systems often do not optimise with regard to a given Service Level Agreement (SLA) and where they do, they do not take incoming workload into account. To our knowledge there has been little or no work based around proactively scaling the DSPS with regard to incoming workload in order to maintain SLAs. This PhD will focus on predicting incoming workloads using time series analysis. In order to assess whether a given predicted workload will breach a SLA the response of a DSPS work flow to incoming workload will be modelled using a queuing theory approach. The intention is to build a system that can tune the parameters of this queuing theoretic model, using output metrics such as end-to-end latency and throughput, as the DSPS is running. The end result will be a system that can identify potential SLA breaches before they happen, and initiate a proactive scaling response. Initially, Apache Storm will be used as the test DSPS, however it is anticipated that the system developed during this PhD will be applicable to other DSPS that use a graph-based description of the streaming work flow e.g. Apache Spark and Flink.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121004715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Ritter, Norman May, Kai Sachs, S. Rinderle-Ma
The integration of a growing number of distributed, heterogeneous applications is one of the main challenges of enterprise data management. Through the advent of cloud and mobile application integration, higher volumes of messages have to be processed, compared to common enterprise computing scenarios, while guaranteeing high throughput. However, no previous study has analyzed the impact on message throughput for Enterprise Integration Patterns (EIPs) (e. g., channel creation, routing and transformation). Acknowledging this void, we propose EIPBench, a comprehensive micro-benchmark design for evaluating the message throughput of frequently implemented EIPs and message delivery semantics in productive cloud scenarios. For that, these scenarios are collected and described in a process-driven, TPC-C-like taxonomy, from which the most relevant patterns, message formats, and scale factors are derived as foundation for the benchmark. To prove its applicability, we describe an EIPBench reference implementation and discuss the results of its application to an open source integration system that implements the selected patterns.
{"title":"Benchmarking integration pattern implementations","authors":"Daniel Ritter, Norman May, Kai Sachs, S. Rinderle-Ma","doi":"10.1145/2933267.2933269","DOIUrl":"https://doi.org/10.1145/2933267.2933269","url":null,"abstract":"The integration of a growing number of distributed, heterogeneous applications is one of the main challenges of enterprise data management. Through the advent of cloud and mobile application integration, higher volumes of messages have to be processed, compared to common enterprise computing scenarios, while guaranteeing high throughput. However, no previous study has analyzed the impact on message throughput for Enterprise Integration Patterns (EIPs) (e. g., channel creation, routing and transformation). Acknowledging this void, we propose EIPBench, a comprehensive micro-benchmark design for evaluating the message throughput of frequently implemented EIPs and message delivery semantics in productive cloud scenarios. For that, these scenarios are collected and described in a process-driven, TPC-C-like taxonomy, from which the most relevant patterns, message formats, and scale factors are derived as foundation for the benchmark. To prove its applicability, we describe an EIPBench reference implementation and discuss the results of its application to an open source integration system that implements the selected patterns.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121549633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many publish/subscribe systems are based on a tree topology as their message dissemination overlay. However, in trees, even a single broker failure can cause delivery disruption. Hence, a repair mechanism is required, along with message retransmission to prevent message loss. During repair and recovery, the latency of message delivery can temporarily increase. To address this problem, we present an epidemic protocol to allow a content-based publish/subscribe system to keep delivering messages with low latency, while failed brokers are recovering. Using a broker similarity metric, which takes into account the content space and the overlay topology, we control and direct gossip messages around failed brokers. We compare our approach against a deterministic reliable publish/subscribe approach and an alternative epidemic approach. Based on our evaluations, we show that in our approach, the delivery ratio and latency of message deliveries are close to the deterministic approach, with up to 70% less message overhead than the alternative epidemic approach. Furthermore, our approach is able to provide a higher message delivery ratio than the deterministic alternative at high failure rates or when broker failures follow a non-uniform distribution.
{"title":"Highly-available content-based publish/subscribe via gossiping","authors":"P. Salehi, Christoph Doblander, H. Jacobsen","doi":"10.1145/2933267.2933303","DOIUrl":"https://doi.org/10.1145/2933267.2933303","url":null,"abstract":"Many publish/subscribe systems are based on a tree topology as their message dissemination overlay. However, in trees, even a single broker failure can cause delivery disruption. Hence, a repair mechanism is required, along with message retransmission to prevent message loss. During repair and recovery, the latency of message delivery can temporarily increase. To address this problem, we present an epidemic protocol to allow a content-based publish/subscribe system to keep delivering messages with low latency, while failed brokers are recovering. Using a broker similarity metric, which takes into account the content space and the overlay topology, we control and direct gossip messages around failed brokers. We compare our approach against a deterministic reliable publish/subscribe approach and an alternative epidemic approach. Based on our evaluations, we show that in our approach, the delivery ratio and latency of message deliveries are close to the deterministic approach, with up to 70% less message overhead than the alternative epidemic approach. Furthermore, our approach is able to provide a higher message delivery ratio than the deterministic alternative at high failure rates or when broker failures follow a non-uniform distribution.","PeriodicalId":277061,"journal":{"name":"Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128617545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}