Proceedings of the 20th International Middleware Conference最新文献_第2页

Scalable Data-structures with Hierarchical, Distributed Delegation 可扩展的数据结构与分层，分布式委托

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361537

Yuxin Ren, Gabriel Parmer

Scaling data-structures up to the increasing number of cores provided by modern systems is challenging. The quest for scalability is complicated by the non-uniform memory accesses (NUMA) of multi-socket machines that often prohibit the effective use of data-structures that span memory localities. Conventional shared memory data-structures using efficient non-blocking or lock-based implementations inevitably suffer from cache-coherency overheads, and non-local memory accesses between sockets. Multi-socket systems are common in cloud hardware, and many products are pushing shared memory systems to greater scales, thus making the ability to scale data-structures all the more pressing. In this paper, we present the Distributed, Delegated Parallel Sections (DPS) runtime system that uses message-passing to move the computation on portions of data-structures between memory localities, while leveraging efficient shared memory implementations within each locality to harness efficient parallelism. We show through a series of data-structure scalability evaluations, and through an adaptation of memcached, that DPS enables strong data-structure scalability. DPS provides more than a factor of 3.1 improvements in throughput, and 23x decreases in tail latency for memcached.

将数据结构扩展到现代系统提供的越来越多的核心是一项挑战。多套接字机器的非统一内存访问(NUMA)通常会阻碍跨内存位置的数据结构的有效使用，这使得对可伸缩性的追求变得复杂。使用高效的非阻塞或基于锁的实现的传统共享内存数据结构不可避免地要承受缓存一致性开销，以及套接字之间的非本地内存访问。多套接字系统在云硬件中很常见，许多产品正在将共享内存系统推向更大的规模，从而使扩展数据结构的能力更加紧迫。在本文中，我们提出了分布式、委托并行部分(DPS)运行时系统，该系统使用消息传递在内存位置之间移动数据结构部分的计算，同时利用每个位置内有效的共享内存实现来利用有效的并行性。通过一系列数据结构可伸缩性评估和memcached的适配，我们表明DPS支持强大的数据结构可伸缩性。DPS提供了3.1倍以上的吞吐量改进，并将memcached的尾部延迟减少了23倍。

{"title":"Scalable Data-structures with Hierarchical, Distributed Delegation","authors":"Yuxin Ren, Gabriel Parmer","doi":"10.1145/3361525.3361537","DOIUrl":"https://doi.org/10.1145/3361525.3361537","url":null,"abstract":"Scaling data-structures up to the increasing number of cores provided by modern systems is challenging. The quest for scalability is complicated by the non-uniform memory accesses (NUMA) of multi-socket machines that often prohibit the effective use of data-structures that span memory localities. Conventional shared memory data-structures using efficient non-blocking or lock-based implementations inevitably suffer from cache-coherency overheads, and non-local memory accesses between sockets. Multi-socket systems are common in cloud hardware, and many products are pushing shared memory systems to greater scales, thus making the ability to scale data-structures all the more pressing. In this paper, we present the Distributed, Delegated Parallel Sections (DPS) runtime system that uses message-passing to move the computation on portions of data-structures between memory localities, while leveraging efficient shared memory implementations within each locality to harness efficient parallelism. We show through a series of data-structure scalability evaluations, and through an adaptation of memcached, that DPS enables strong data-structure scalability. DPS provides more than a factor of 3.1 improvements in throughput, and 23x decreases in tail latency for memcached.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125733509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures 在FaaS轨道上:用无服务器架构构建有状态分布式应用程序

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361535

Daniel Barcelona Pons, Marc Sánchez Artigas, Gerard París, P. Sutra, P. López

Serverless computing is an emerging paradigm that greatly simplifies the usage of cloud resources and suits well to many tasks. Most notably, Function-as-a-Service (FaaS) enables programmers to develop cloud applications as individual functions that can run and scale independently. Yet, due to the disaggregation of storage and compute resources in FaaS, applications that require fine-grained support for mutable state and synchronization, such as machine learning and scientific computing, are hard to build. In this work, we present Crucial, a system to program highly-concurrent stateful applications with serverless architectures. Its programming model keeps the simplicity of FaaS and allows to port effortlessly multi-threaded algorithms to this new environment. Crucial is built upon the key insight that FaaS resembles to concurrent programming at the scale of a data center. As a consequence, a distributed shared memory layer is the right answer to the need for fine-grained state management and coordination in serverless. We validate our system with the help of micro-benchmarks and various applications. In particular, we implement two common machine learning algorithms: k-means clustering and logistic regression. For both cases, Crucial obtains superior or comparable performance to an equivalent Spark cluster.

无服务器计算是一种新兴的范例，它极大地简化了云资源的使用，非常适合许多任务。最值得注意的是，功能即服务(FaaS)使程序员能够将云应用程序开发为可以独立运行和扩展的单个功能。然而，由于FaaS中存储和计算资源的分解，需要细粒度支持可变状态和同步的应用程序(如机器学习和科学计算)很难构建。在这项工作中，我们提出了critical，一个使用无服务器架构编写高并发状态应用程序的系统。它的编程模型保持了FaaS的简单性，并允许毫不费力地将多线程算法移植到这个新环境中。关键是基于FaaS类似于数据中心规模上的并发编程这一关键见解。因此，分布式共享内存层是对无服务器中细粒度状态管理和协调需求的正确答案。我们通过微基准测试和各种应用程序来验证我们的系统。特别地，我们实现了两种常见的机器学习算法:k-means聚类和逻辑回归。在这两种情况下，与同等的Spark集群相比，Crucial获得了更好或相当的性能。

{"title":"On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures","authors":"Daniel Barcelona Pons, Marc Sánchez Artigas, Gerard París, P. Sutra, P. López","doi":"10.1145/3361525.3361535","DOIUrl":"https://doi.org/10.1145/3361525.3361535","url":null,"abstract":"Serverless computing is an emerging paradigm that greatly simplifies the usage of cloud resources and suits well to many tasks. Most notably, Function-as-a-Service (FaaS) enables programmers to develop cloud applications as individual functions that can run and scale independently. Yet, due to the disaggregation of storage and compute resources in FaaS, applications that require fine-grained support for mutable state and synchronization, such as machine learning and scientific computing, are hard to build. In this work, we present Crucial, a system to program highly-concurrent stateful applications with serverless architectures. Its programming model keeps the simplicity of FaaS and allows to port effortlessly multi-threaded algorithms to this new environment. Crucial is built upon the key insight that FaaS resembles to concurrent programming at the scale of a data center. As a consequence, a distributed shared memory layer is the right answer to the need for fine-grained state management and coordination in serverless. We validate our system with the help of micro-benchmarks and various applications. In particular, we implement two common machine learning algorithms: k-means clustering and logistic regression. For both cases, Crucial obtains superior or comparable performance to an equivalent Spark cluster.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 84

Lazarus: Automatic Management of Diversity in BFT Systems Lazarus: BFT系统中多样性的自动管理

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361550

Miguel García, A. Bessani, N. Neves

A long-standing promise of Byzantine Fault-Tolerant (BFT) replication is to maintain the service correctness despite the presence of malicious failures. The key challenge here is how to ensure replicas fail independently, i.e., avoid that a single attack compromises more than f replicas at once. The obvious answer for this is the use of diverse replicas, but most works in BFT simply assume such diversity without supporting mechanisms to substantiate this assumption. Lazarus is a control plane for managing the deployment and execution of diverse replicas in BFT systems. Lazarus continuously monitors the current vulnerabilities of the system replicas (reported in security feeds such as NVD and ExploitDB) and employs a metric to measure the risk of having a common weakness in the replicas set. If such risk is high, the set of replicas is reconfigured. Our evaluation shows that the devised strategy reduces the number of executions where the system becomes compromised and that our prototype supports the execution of full-fledged BFT systems in diverse configurations with 17 OS versions, reaching a performance close to a homogeneous bare-metal setup.

拜占庭容错(BFT)复制的一个长期承诺是，即使存在恶意故障，也能保持服务的正确性。这里的关键挑战是如何确保副本独立失败，即避免一次攻击危及多个副本。显而易见的答案是使用不同的副本，但BFT中的大多数工作只是假设这种多样性，而没有支持机制来证实这一假设。Lazarus是一个控制平面，用于管理BFT系统中不同副本的部署和执行。Lazarus持续监控系统副本的当前漏洞(在NVD和ExploitDB等安全源中报告)，并使用度量来衡量副本集中存在共同弱点的风险。如果这种风险很高，则重新配置副本集。我们的评估表明，设计的策略减少了系统受到损害的执行次数，并且我们的原型支持在17个操作系统版本的不同配置下执行成熟的BFT系统，达到接近同构裸机设置的性能。

{"title":"Lazarus: Automatic Management of Diversity in BFT Systems","authors":"Miguel García, A. Bessani, N. Neves","doi":"10.1145/3361525.3361550","DOIUrl":"https://doi.org/10.1145/3361525.3361550","url":null,"abstract":"A long-standing promise of Byzantine Fault-Tolerant (BFT) replication is to maintain the service correctness despite the presence of malicious failures. The key challenge here is how to ensure replicas fail independently, i.e., avoid that a single attack compromises more than f replicas at once. The obvious answer for this is the use of diverse replicas, but most works in BFT simply assume such diversity without supporting mechanisms to substantiate this assumption. Lazarus is a control plane for managing the deployment and execution of diverse replicas in BFT systems. Lazarus continuously monitors the current vulnerabilities of the system replicas (reported in security feeds such as NVD and ExploitDB) and employs a metric to measure the risk of having a common weakness in the replicas set. If such risk is high, the set of replicas is reconfigured. Our evaluation shows that the devised strategy reduces the number of executions where the system becomes compromised and that our prototype supports the execution of full-fledged BFT systems in diverse configurations with 17 OS versions, reaching a performance close to a homogeneous bare-metal setup.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121431525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

AccTEE

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361541

David Goltzsche, Manuel Nieke, Thomas Knauth, Rüdiger Kapitza

Remote computation has numerous use cases such as cloud computing, client-side web applications or volunteer computing. Typically, these computations are executed inside a sandboxed environment for two reasons: first, to isolate the execution in order to protect the host environment from unauthorised access, and second to control and restrict resource usage. Often, there is mutual distrust between entities providing the code and the ones executing it, owing to concerns over three potential problems: (i) loss of control over code and data by the providing entity, (ii) uncertainty of the integrity of the execution environment for customers, and (iii) a missing mutually trusted accounting of resource usage. In this paper we present AccTEE, a two-way sandbox that offers remote computation with resource accounting trusted by consumers and providers. AccTEE leverages two recent technologies: hardware-protected trusted execution environments, and Web-Assembly, a novel platform independent byte-code format. We show how AccTEE uses automated code instrumentation for fine-grained resource accounting while maintaining confidentiality and integrity of code and data. Our evaluation of AccTEE in three scenarios -- volunteer computing, serverless computing, and pay-by-computation for the web -- shows a maximum accounting overhead of 10%.

{"title":"AccTEE","authors":"David Goltzsche, Manuel Nieke, Thomas Knauth, Rüdiger Kapitza","doi":"10.1145/3361525.3361541","DOIUrl":"https://doi.org/10.1145/3361525.3361541","url":null,"abstract":"Remote computation has numerous use cases such as cloud computing, client-side web applications or volunteer computing. Typically, these computations are executed inside a sandboxed environment for two reasons: first, to isolate the execution in order to protect the host environment from unauthorised access, and second to control and restrict resource usage. Often, there is mutual distrust between entities providing the code and the ones executing it, owing to concerns over three potential problems: (i) loss of control over code and data by the providing entity, (ii) uncertainty of the integrity of the execution environment for customers, and (iii) a missing mutually trusted accounting of resource usage. In this paper we present AccTEE, a two-way sandbox that offers remote computation with resource accounting trusted by consumers and providers. AccTEE leverages two recent technologies: hardware-protected trusted execution environments, and Web-Assembly, a novel platform independent byte-code format. We show how AccTEE uses automated code instrumentation for fine-grained resource accounting while maintaining confidentiality and integrity of code and data. Our evaluation of AccTEE in three scenarios -- volunteer computing, serverless computing, and pay-by-computation for the web -- shows a maximum accounting overhead of 10%.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115505029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Monitorless Monitorless

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361543

Johannes Grohmann, Patrick K. Nicholson, Jesús Omana Iglesias, Samuel Kounev, D. Lugones

Today, software operation engineers rely on application key performance indicators (KPIs) for sizing and orchestrating cloud resources dynamically. KPIs are monitored to assess the achievable performance and to configure various cloud-specific parameters such as flavors of instances and autoscaling rules, among others. Usually, keeping KPIs within acceptable levels requires application expertise which is expensive and can slow down the continuous delivery of software. Expertise is required because KPIs are normally based on application-specific quality-of-service metrics, like service response time and processing rate, instead of generic platform metrics, like those typical across various environments (e.g., CPU and memory utilization, I/O rate, etc.) In this paper, we investigate the feasibility of outsourcing the management of application performance from developers to cloud operators. In the same way that the serverless paradigm allows the execution environment to be fully managed by a third party, we discuss a monitorless model to streamline application deployment by delegating performance management. We show that training a machine learning model with platform-level data, collected from the execution of representative containerized services, allows inferring application KPI degradation. This is an opportunity to simplify operations as engineers can rely solely on platform metrics -- while still fulfilling application KPIs -- to configure portable and application agnostic rules and other cloud-specific parameters to automatically trigger actions such as autoscaling, instance migration, network slicing, etc. Results show that monitorless infers KPI degradation with an accuracy of 97% and, notably, it performs similarly to typical autoscaling solutions, even when autoscaling rules are optimally tuned with knowledge of the expected workload.

{"title":"Monitorless","authors":"Johannes Grohmann, Patrick K. Nicholson, Jesús Omana Iglesias, Samuel Kounev, D. Lugones","doi":"10.1145/3361525.3361543","DOIUrl":"https://doi.org/10.1145/3361525.3361543","url":null,"abstract":"Today, software operation engineers rely on application key performance indicators (KPIs) for sizing and orchestrating cloud resources dynamically. KPIs are monitored to assess the achievable performance and to configure various cloud-specific parameters such as flavors of instances and autoscaling rules, among others. Usually, keeping KPIs within acceptable levels requires application expertise which is expensive and can slow down the continuous delivery of software. Expertise is required because KPIs are normally based on application-specific quality-of-service metrics, like service response time and processing rate, instead of generic platform metrics, like those typical across various environments (e.g., CPU and memory utilization, I/O rate, etc.) In this paper, we investigate the feasibility of outsourcing the management of application performance from developers to cloud operators. In the same way that the serverless paradigm allows the execution environment to be fully managed by a third party, we discuss a monitorless model to streamline application deployment by delegating performance management. We show that training a machine learning model with platform-level data, collected from the execution of representative containerized services, allows inferring application KPI degradation. This is an opportunity to simplify operations as engineers can rely solely on platform metrics -- while still fulfilling application KPIs -- to configure portable and application agnostic rules and other cloud-specific parameters to automatically trigger actions such as autoscaling, instance migration, network slicing, etc. Results show that monitorless infers KPI degradation with an accuracy of 97% and, notably, it performs similarly to typical autoscaling solutions, even when autoscaling rules are optimally tuned with knowledge of the expected workload.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123388942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Switchboard: A Middleware for Wide-Area Service Chaining 总机:广域服务链的中间件

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361555

Abhigyan Sharma, Yoji Ozawa, M. Hiltunen, Kaustubh R. Joshi, R. Schlichting, Zhaoyu Gao

Production networks are transitioning from the use of physical middleboxes to virtual network functions (VNFs), which makes it easy to construct highly-customized service chains of VNFs dynamically using software. Wide-area service chains are increasingly important given the emergence of heterogeneous execution platforms consisting of customer premise equipment (CPE), small edge cloud sites, and large centralized cloud data centers, since only part of the service chain can be deployed at the CPE and even the closest edge site may not always be able to process all the customers' traffic. Switchboard is a middleware for realizing and managing such an ecosystem of diverse VNFs and cloud platforms. It exploits principles from service-oriented architectures to treat VNFs as independent services, and provides a traffic routing platform shared by all VNFs. Moreover, Switchboard's global controller optimizes wide-area routes based on a holistic view of customer traffic as well as the resources available at VNFs and the underlying network. Switchboard globally optimized routes achieve up to 57% higher throughput and 49% lower latency than a distributed load balancing approach in a wide-area testbed. Its routing platform supports line-rate traffic with millions of concurrent flows.

生产网络正在从使用物理中间盒向使用虚拟网络功能(VNFs)过渡，这使得使用软件动态构建高度定制的VNFs服务链变得容易。考虑到由客户端设备(CPE)、小型边缘云站点和大型集中式云数据中心组成的异构执行平台的出现，广域服务链变得越来越重要，因为只有部分服务链可以部署在CPE上，即使是最近的边缘站点也不一定总是能够处理所有客户的流量。交换机是一个中间件，用于实现和管理这样一个由各种VNFs和云平台组成的生态系统。它利用面向服务体系结构的原则，将VNFs视为独立的服务，并提供由所有VNFs共享的流量路由平台。此外，Switchboard的全局控制器基于客户流量的整体视图以及vnf和底层网络的可用资源来优化广域路由。在广域测试平台中，与分布式负载平衡方法相比，总机全局优化路由的吞吐量可提高57%，延迟可降低49%。它的路由平台支持数百万并发流的线速率流量。

{"title":"Switchboard: A Middleware for Wide-Area Service Chaining","authors":"Abhigyan Sharma, Yoji Ozawa, M. Hiltunen, Kaustubh R. Joshi, R. Schlichting, Zhaoyu Gao","doi":"10.1145/3361525.3361555","DOIUrl":"https://doi.org/10.1145/3361525.3361555","url":null,"abstract":"Production networks are transitioning from the use of physical middleboxes to virtual network functions (VNFs), which makes it easy to construct highly-customized service chains of VNFs dynamically using software. Wide-area service chains are increasingly important given the emergence of heterogeneous execution platforms consisting of customer premise equipment (CPE), small edge cloud sites, and large centralized cloud data centers, since only part of the service chain can be deployed at the CPE and even the closest edge site may not always be able to process all the customers' traffic. Switchboard is a middleware for realizing and managing such an ecosystem of diverse VNFs and cloud platforms. It exploits principles from service-oriented architectures to treat VNFs as independent services, and provides a traffic routing platform shared by all VNFs. Moreover, Switchboard's global controller optimizes wide-area routes based on a holistic view of customer traffic as well as the resources available at VNFs and the underlying network. Switchboard globally optimized routes achieve up to 57% higher throughput and 49% lower latency than a distributed load balancing approach in a wide-area testbed. Its routing platform supports line-rate traffic with millions of concurrent flows.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124482672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Self-adaptive Executors for Big Data Processing 大数据处理的自适应执行器

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361545

Sobhan Omranian Khorasani, Jan S. Rellermeyer, D. Epema

The demand for additional performance due to the rapid increase in the size and importance of data-intensive applications has considerably elevated the complexity of computer architecture. In response, systems offer pre-determined behaviors based on heuristics and then expose a large number of configuration parameters for operators to adjust them to their particular infrastructure. Unfortunately, in practice this leads to a substantial manual tuning effort. In this work, we focus on one of the most impactful tuning decisions in big data systems: the number of executor threads. We first show the impact of I/O contention on the runtime of workloads and a simple static solution to reduce the number of threads for I/O-bound phases. We then present a more elaborate solution in the form of self-adaptive executors which are able to continuously monitor the underlying system resources and detect contentions. This enables the executors to tune their thread pool size dynamically at runtime in order to achieve the best performance. Our experimental results show that being adaptive can significantly reduce the execution time especially in I/O intensive applications such as Terasort and PageRank which see a 34% and 54% reduction in runtime.

由于数据密集型应用程序的规模和重要性的迅速增加，对额外性能的需求大大提高了计算机体系结构的复杂性。作为回应，系统提供基于启发式的预先确定的行为，然后为运营商提供大量配置参数，以便根据其特定的基础设施进行调整。不幸的是，在实践中，这会导致大量的手动调优工作。在这项工作中，我们专注于大数据系统中最具影响力的调优决策之一:执行线程的数量。我们首先展示I/O争用对工作负载运行时的影响，以及一个简单的静态解决方案，以减少I/O绑定阶段的线程数量。然后，我们以自适应执行器的形式提出了一个更复杂的解决方案，它能够持续监控底层系统资源并检测争用。这使执行程序能够在运行时动态地调整线程池大小，以实现最佳性能。我们的实验结果表明，自适应可以显著减少执行时间，特别是在I/O密集型应用程序中，如Terasort和PageRank，它们的运行时间分别减少了34%和54%。

{"title":"Self-adaptive Executors for Big Data Processing","authors":"Sobhan Omranian Khorasani, Jan S. Rellermeyer, D. Epema","doi":"10.1145/3361525.3361545","DOIUrl":"https://doi.org/10.1145/3361525.3361545","url":null,"abstract":"The demand for additional performance due to the rapid increase in the size and importance of data-intensive applications has considerably elevated the complexity of computer architecture. In response, systems offer pre-determined behaviors based on heuristics and then expose a large number of configuration parameters for operators to adjust them to their particular infrastructure. Unfortunately, in practice this leads to a substantial manual tuning effort. In this work, we focus on one of the most impactful tuning decisions in big data systems: the number of executor threads. We first show the impact of I/O contention on the runtime of workloads and a simple static solution to reduce the number of threads for I/O-bound phases. We then present a more elaborate solution in the form of self-adaptive executors which are able to continuously monitor the underlying system resources and detect contentions. This enables the executors to tune their thread pool size dynamically at runtime in order to achieve the best performance. Our experimental results show that being adaptive can significantly reduce the execution time especially in I/O intensive applications such as Terasort and PageRank which see a 34% and 54% reduction in runtime.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121529858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Generalized Consensus for Practical Fault Tolerance 实用容错的广义共识

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361536

Mohit Garg, Sebastiano Peluso, Balaji Arun, B. Ravindran

Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of nodes are correct and communicate synchronously. Although XPaxos of Liu et al. (applying the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. We present Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty replicas during periods of synchrony. Subsequently, these replicas order all commands which access these objects. The experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols.

尽管对拜占庭容错(BFT)系统进行了广泛的研究，但与此类解决方案相关的开销阻碍了广泛采用。过去的努力，如交叉容错(XFT)模型，通过一个较弱的假设来解决这个问题，即大多数节点是正确的，并且同步通信。尽管Liu等人的XPaxos(应用XFT模型)实现了与Paxos相似的性能，但它不随故障数量的增加而扩展。此外，它对单个领导的依赖在发生故障时引入了相当长的停机时间。我们提出了Elpis，第一个多领导者XFT共识协议。通过采用广义共识规范，我们能够设计一个多领导者协议，利用系统命令中固有的交换性。在同步期间，Elpis将访问的对象映射到无故障的副本。随后，这些副本对访问这些对象的所有命令进行排序。实验评估证实了这种方法的有效性:Elpis比XPaxos实现了高达2倍的加速，比最先进的拜占庭容错共识协议实现了高达3.5倍的加速。

{"title":"Generalized Consensus for Practical Fault Tolerance","authors":"Mohit Garg, Sebastiano Peluso, Balaji Arun, B. Ravindran","doi":"10.1145/3361525.3361536","DOIUrl":"https://doi.org/10.1145/3361525.3361536","url":null,"abstract":"Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of nodes are correct and communicate synchronously. Although XPaxos of Liu et al. (applying the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. We present Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty replicas during periods of synchrony. Subsequently, these replicas order all commands which access these objects. The experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133361235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

AquaEIS

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361554

Qing Han, Sharad Mehrotra, N. Venkatasubramanian

Real-time event identification is critical in complex distributed infrastructures, e.g., water systems, where failures are difficult to isolate. We present AquaEIS, an event-based middleware tailored to the problem of locating sources of failure (e.g., contamination) in community water infrastructures. The inherent complexity of underground hydraulic systems combined with aging infrastructure presents unique challenges. AquaEIS combines online learning techniques, model-driven simulators and data from limited sensing networks to intelligently guide human participants (e.g., staff) in identifying contaminant sources. The framework integrates the necessary abstractions with event processing methods into a workflow that iteratively selects and refines the set of potential failure points for human-driven grab sampling. The integrated platform utilizes Hidden Markov Model (HMM) based representations along with field reports for event inference; reinforcement learning (RL) methods have also shown promise for further refining event locations and reducing the cost of human engagement. Our approach is evaluated in real-world water systems under a range of distinct events. The results show that AquaEIS can significantly reduce the number of sampling cycles, while ensuring localization accuracy (detected 100% of the failure events as compared to a baseline that can only identify 38% of the events).

{"title":"AquaEIS","authors":"Qing Han, Sharad Mehrotra, N. Venkatasubramanian","doi":"10.1145/3361525.3361554","DOIUrl":"https://doi.org/10.1145/3361525.3361554","url":null,"abstract":"Real-time event identification is critical in complex distributed infrastructures, e.g., water systems, where failures are difficult to isolate. We present AquaEIS, an event-based middleware tailored to the problem of locating sources of failure (e.g., contamination) in community water infrastructures. The inherent complexity of underground hydraulic systems combined with aging infrastructure presents unique challenges. AquaEIS combines online learning techniques, model-driven simulators and data from limited sensing networks to intelligently guide human participants (e.g., staff) in identifying contaminant sources. The framework integrates the necessary abstractions with event processing methods into a workflow that iteratively selects and refines the set of potential failure points for human-driven grab sampling. The integrated platform utilizes Hidden Markov Model (HMM) based representations along with field reports for event inference; reinforcement learning (RL) methods have also shown promise for further refining event locations and reducing the cost of human engagement. Our approach is evaluated in real-world water systems under a range of distinct events. The results show that AquaEIS can significantly reduce the number of sampling cycles, while ensuring localization accuracy (detected 100% of the failure events as compared to a baseline that can only identify 38% of the events).","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123763932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

eSPICE: Probabilistic Load Shedding from Input Event Streams in Complex Event Processing 复杂事件处理中输入事件流的概率减载

Proceedings of the 20th International Middleware Conference

Pub Date : 2019-12-09 DOI: 10.1145/3361525.3361548

Ahmad Slo, Sukanya Bhowmik, K. Rothermel

Complex event processing systems process the input event streams on-the-fly. Since input event rate could overshoot the system's capabilities and results in violating a defined latency bound, load shedding is used to drop a portion of the input event streams. The crucial question here is how many and which events to drop so the defined latency bound is maintained and the degradation in the quality of results is minimized. In stream processing domain, different load shedding strategies have been proposed but they mainly depend on the importance of individual tuples (events). However, as complex event processing systems perform pattern detection, the importance of events is also influenced by other events in the same pattern. In this paper, we propose a load shedding framework called eSPICE for complex event processing systems. eSPICE depends on building a probabilistic model that learns about the importance of events in a window. The position of an event in a window and its type are used as features to build the model. Further, we provide algorithms to decide when to start dropping events and how many events to drop. Moreover, we extensively evaluate the performance of eSPICE on two real-world datasets.

复杂事件处理系统动态地处理输入事件流。由于输入事件率可能超过系统的能力并导致违反定义的延迟界限，因此使用负载减少来减少部分输入事件流。这里的关键问题是要丢弃多少事件和哪些事件，这样才能维持定义的延迟范围，并将结果质量的下降降至最低。在流处理领域，提出了不同的负载削减策略，但它们主要取决于单个元组(事件)的重要性。然而，当复杂事件处理系统执行模式检测时，事件的重要性也会受到同一模式中其他事件的影响。在本文中，我们提出了一个称为eSPICE的负载释放框架，用于复杂事件处理系统。eSPICE依赖于建立一个概率模型来了解窗口中事件的重要性。事件在窗口中的位置及其类型用作构建模型的特征。此外，我们还提供了算法来决定何时开始删除事件以及要删除多少事件。此外，我们广泛评估了eSPICE在两个真实数据集上的性能。

{"title":"eSPICE: Probabilistic Load Shedding from Input Event Streams in Complex Event Processing","authors":"Ahmad Slo, Sukanya Bhowmik, K. Rothermel","doi":"10.1145/3361525.3361548","DOIUrl":"https://doi.org/10.1145/3361525.3361548","url":null,"abstract":"Complex event processing systems process the input event streams on-the-fly. Since input event rate could overshoot the system's capabilities and results in violating a defined latency bound, load shedding is used to drop a portion of the input event streams. The crucial question here is how many and which events to drop so the defined latency bound is maintained and the degradation in the quality of results is minimized. In stream processing domain, different load shedding strategies have been proposed but they mainly depend on the importance of individual tuples (events). However, as complex event processing systems perform pattern detection, the importance of events is also influenced by other events in the same pattern. In this paper, we propose a load shedding framework called eSPICE for complex event processing systems. eSPICE depends on building a probabilistic model that learns about the importance of events in a window. The position of an event in a window and its type are used as features to build the model. Further, we provide algorithms to decide when to start dropping events and how many events to drop. Moreover, we extensively evaluate the performance of eSPICE on two real-world datasets.","PeriodicalId":381253,"journal":{"name":"Proceedings of the 20th International Middleware Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130847391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17