D. Dean, Peipei Wang, Xiaohui Gu, W. Enck, Guoliang Jin
It is notoriously difficult to diagnose server hang bugs as they often generate little diagnostic information and are difficult to reproduce offline. In this paper, we present a characteristic study of 177 real software hang bugs from 8 common open source server systems (i.e., Apache, Lighttpd, My SQL, Squid, HDFS, Hadoop Mapreduce, Tomcat, Cassandra). We identify three major root cause categories (i.e., Programmer errors, mishandled values, concurrency issues). We then describe two major problems (i.e., False positives and false negatives) while applying existing rule-based bug detection techniques to those bugs.
众所周知,诊断服务器挂起错误非常困难,因为它们通常只生成很少的诊断信息,而且很难脱机重现。在本文中,我们对8个常见的开源服务器系统(Apache, Lighttpd, My SQL, Squid, HDFS, Hadoop Mapreduce, Tomcat, Cassandra)中的177个真实软件挂起错误进行了特征研究。我们确定了三个主要的根本原因类别(即,程序员错误,错误处理的值,并发性问题)。然后我们描述了两个主要问题(即假阳性和假阴性),同时将现有的基于规则的错误检测技术应用于这些错误。
{"title":"Automatic Server Hang Bug Diagnosis: Feasible Reality or Pipe Dream?","authors":"D. Dean, Peipei Wang, Xiaohui Gu, W. Enck, Guoliang Jin","doi":"10.1109/ICAC.2015.52","DOIUrl":"https://doi.org/10.1109/ICAC.2015.52","url":null,"abstract":"It is notoriously difficult to diagnose server hang bugs as they often generate little diagnostic information and are difficult to reproduce offline. In this paper, we present a characteristic study of 177 real software hang bugs from 8 common open source server systems (i.e., Apache, Lighttpd, My SQL, Squid, HDFS, Hadoop Mapreduce, Tomcat, Cassandra). We identify three major root cause categories (i.e., Programmer errors, mishandled values, concurrency issues). We then describe two major problems (i.e., False positives and false negatives) while applying existing rule-based bug detection techniques to those bugs.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"141 1","pages":"127-132"},"PeriodicalIF":0.0,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83031933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anshul Gandhi, Parijat Dube, Andrzej Kochut, Li Zhang
In this paper, we present the design and implementation of a model-driven auto scaling solution for Hadoop clusters. We first develop novel performance models for Hadoop workloads that relate job completion times to various workload and system parameters such as input size and resource allocation. We then employ statistical techniques to tune the models for specific workloads, including Terasort and K-means. Finally, we employ the tuned models to determine the resources required to successfully complete the Hadoop jobs as per the user-specified response time SLA. We implement our solution on an Open Stack-based cloud cluster running Hadoop. Our experimental results across different workloads demonstrate the auto scaling capabilities of our solution, and enable significant resource savings without compromising performance.
{"title":"Model-Driven Autoscaling for Hadoop Clusters","authors":"Anshul Gandhi, Parijat Dube, Andrzej Kochut, Li Zhang","doi":"10.1109/ICAC.2015.50","DOIUrl":"https://doi.org/10.1109/ICAC.2015.50","url":null,"abstract":"In this paper, we present the design and implementation of a model-driven auto scaling solution for Hadoop clusters. We first develop novel performance models for Hadoop workloads that relate job completion times to various workload and system parameters such as input size and resource allocation. We then employ statistical techniques to tune the models for specific workloads, including Terasort and K-means. Finally, we employ the tuned models to determine the resources required to successfully complete the Hadoop jobs as per the user-specified response time SLA. We implement our solution on an Open Stack-based cloud cluster running Hadoop. Our experimental results across different workloads demonstrate the auto scaling capabilities of our solution, and enable significant resource savings without compromising performance.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"28 1","pages":"155-156"},"PeriodicalIF":0.0,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85387901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the past few years, many research projects have begun to focus on swarms of mobile unmanned systems (e.g., Drones, ground robots) globally referred as UMS. These systems, because of the many sensors and actuators they can embed, are suitable for autonomous missions in 3D (Dull, Dirty and Dangerous) environments for instance. However, embedding a large number of capabilities in all of members of a swarm is expensive in terms of cost, weight and energy consumption. Thus, it is usually more efficient to embed only a single or a few capabilities within each UMS. It then becomes necessary to provide a discovery mechanism built into the swarm in order to allow its members to share their capabilities and to collaborate for achieving a global mission. These shared capabilities are called services. In this paper, we propose a new service discovery system called AMiRALE for Asynchronous Missions Relay for Autonomous and Lively Entities dedicated to highly volatile, autonomous and mobile swarms of UMS. Our solution is independent of both nodes' mobility and connectivity patterns. Moreover, it supports heterogeneous swarms and degraded conditions of operation (i.e., Message loss, UMS loss and disconnected network). It is also totally decentralized and enables both discovery and service usage. We provide a description of the theoretical model of our AMiRALE system as well as several simulation results obtained from a park cleaning scenario.
{"title":"A Mission-Oriented Service Discovery Mechanism for Highly Dynamic Autonomous Swarms of Unmanned Systems","authors":"Vincent Autefage, S. Chaumette, D. Magoni","doi":"10.1109/ICAC.2015.28","DOIUrl":"https://doi.org/10.1109/ICAC.2015.28","url":null,"abstract":"Over the past few years, many research projects have begun to focus on swarms of mobile unmanned systems (e.g., Drones, ground robots) globally referred as UMS. These systems, because of the many sensors and actuators they can embed, are suitable for autonomous missions in 3D (Dull, Dirty and Dangerous) environments for instance. However, embedding a large number of capabilities in all of members of a swarm is expensive in terms of cost, weight and energy consumption. Thus, it is usually more efficient to embed only a single or a few capabilities within each UMS. It then becomes necessary to provide a discovery mechanism built into the swarm in order to allow its members to share their capabilities and to collaborate for achieving a global mission. These shared capabilities are called services. In this paper, we propose a new service discovery system called AMiRALE for Asynchronous Missions Relay for Autonomous and Lively Entities dedicated to highly volatile, autonomous and mobile swarms of UMS. Our solution is independent of both nodes' mobility and connectivity patterns. Moreover, it supports heterogeneous swarms and degraded conditions of operation (i.e., Message loss, UMS loss and disconnected network). It is also totally decentralized and enables both discovery and service usage. We provide a description of the theoretical model of our AMiRALE system as well as several simulation results obtained from a park cleaning scenario.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"38 3 1","pages":"31-40"},"PeriodicalIF":0.0,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83666371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key requirement to realize modern distributed systems is the ability of systems to autonomously adapt their behavior to changing environmental conditions at runtime, to preserve their operation even in the presence of uncertain changes. In order achieve this, the different parts of such a self-organizing system have to be coordinated to achieve meaningful adaptations. To avoid single point of failures, decentralized control is a key element for the realization of robust and scalable self-adaptation. This paper proposes both a middleware, as well as an engineering approach to realize different decentralized control structures for distributed self-organizing systems. The presented work picks up the concept of Active Components as a design element for loosely-coupled distributed systems and extends it by the proposed middleware and engineering approach. Active Components are conceptually based on the Service Component Architecture but extend the component concept with a concurrency model. They resemble software agents as each component is not only a passive service provider but also provides additional autonomous behavior.
{"title":"Middleware for Constructing Decentralized Control in Self-Organizing Systems","authors":"T. Preisler, Tim Dethlefs, W. Renz","doi":"10.1109/ICAC.2015.56","DOIUrl":"https://doi.org/10.1109/ICAC.2015.56","url":null,"abstract":"A key requirement to realize modern distributed systems is the ability of systems to autonomously adapt their behavior to changing environmental conditions at runtime, to preserve their operation even in the presence of uncertain changes. In order achieve this, the different parts of such a self-organizing system have to be coordinated to achieve meaningful adaptations. To avoid single point of failures, decentralized control is a key element for the realization of robust and scalable self-adaptation. This paper proposes both a middleware, as well as an engineering approach to realize different decentralized control structures for distributed self-organizing systems. The presented work picks up the concept of Active Components as a design element for loosely-coupled distributed systems and extends it by the proposed middleware and engineering approach. Active Components are conceptually based on the Service Component Architecture but extend the component concept with a concurrency model. They resemble software agents as each component is not only a passive service provider but also provides additional autonomous behavior.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"12 1","pages":"325-330"},"PeriodicalIF":0.0,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83495076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed dynamic virtual machine (VM) consolidation (DDVMC) is a virtual machine management strategy that uses a distributed rather than a centralized algorithm for finding a right balance between saving energy and attaining best possible performance in cloud data center. One of the significant challenges in DDVMC is that the optimality of this strategy is highly dependent on the quality of the decision-making process. In this paper we propose a cooperative multi agent learning approach to tackle this challenge. The experimental results show that our approach yields far better results w.r.t. The energy-performance tradeoff in cloud data centers in comparison to state-of-the-art algorithms.
{"title":"Dynamic Virtual Machine Consolidation: A Multi Agent Learning Approach","authors":"S. Masoumzadeh, H. Hlavacs","doi":"10.1109/ICAC.2015.17","DOIUrl":"https://doi.org/10.1109/ICAC.2015.17","url":null,"abstract":"Distributed dynamic virtual machine (VM) consolidation (DDVMC) is a virtual machine management strategy that uses a distributed rather than a centralized algorithm for finding a right balance between saving energy and attaining best possible performance in cloud data center. One of the significant challenges in DDVMC is that the optimality of this strategy is highly dependent on the quality of the decision-making process. In this paper we propose a cooperative multi agent learning approach to tackle this challenge. The experimental results show that our approach yields far better results w.r.t. The energy-performance tradeoff in cloud data centers in comparison to state-of-the-art algorithms.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"121 1","pages":"161-162"},"PeriodicalIF":0.0,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88259688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Service-oriented computing has been successfully adopted by the industry. This raises however new challenges, especially with respect to service selection and ranking in dynamic environments. Current solutions for service selection and ranking lack flexibility to handle dynamic environments. This paper proposes to integrate algorithms based on the Formal Concept Analysis theory to extend service-oriented component models. This solution improves the self-adaptation of service-oriented component models. The resulting framework externalizes service selection and ranking. Results are integrated in the Apache Felix iPOJO component model.
面向服务的计算已被业界成功采用。然而,这带来了新的挑战,特别是在动态环境中的服务选择和排名方面。当前的服务选择和排序解决方案缺乏处理动态环境的灵活性。本文提出了基于形式概念分析理论的集成算法来扩展面向服务的组件模型。该解决方案改进了面向服务的组件模型的自适应能力。由此产生的框架将服务选择和排序具体化。结果集成在Apache Felix iPOJO组件模型中。
{"title":"Self-Adaptation of Service Bindings Based on Formal Concept Analysis","authors":"Stéphanie Chollet","doi":"10.1109/ICAC.2015.26","DOIUrl":"https://doi.org/10.1109/ICAC.2015.26","url":null,"abstract":"Service-oriented computing has been successfully adopted by the industry. This raises however new challenges, especially with respect to service selection and ranking in dynamic environments. Current solutions for service selection and ranking lack flexibility to handle dynamic environments. This paper proposes to integrate algorithms based on the Formal Concept Analysis theory to extend service-oriented component models. This solution improves the self-adaptation of service-oriented component models. The resulting framework externalizes service selection and ranking. Results are integrated in the Apache Felix iPOJO component model.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"1 1","pages":"211-214"},"PeriodicalIF":0.0,"publicationDate":"2015-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90737260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He, S. Elnikety
Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers, the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the Easy Rec Recommendation Engine, and the Open Ephyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.
{"title":"Measuring and Managing Answer Quality for Online Data-Intensive Services","authors":"Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He, S. Elnikety","doi":"10.1109/ICAC.2015.33","DOIUrl":"https://doi.org/10.1109/ICAC.2015.33","url":null,"abstract":"Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers, the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the Easy Rec Recommendation Engine, and the Open Ephyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"PP 1","pages":"167-176"},"PeriodicalIF":0.0,"publicationDate":"2015-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84353802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-15DOI: 10.1201/9781420009354.PT4
M. Bennani, D. Menascé
{"title":"Dynamic Server Allocation for Autonomic Service Centers in the Presence of Failures","authors":"M. Bennani, D. Menascé","doi":"10.1201/9781420009354.PT4","DOIUrl":"https://doi.org/10.1201/9781420009354.PT4","url":null,"abstract":"","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2006-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77241523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-15DOI: 10.1201/9781420009354.CH13
Ian Whalley, James E. Hanson, Steve R. White, D. Chess, J. Kephart
{"title":"Dynamic Collaboration in Autonomic Computing","authors":"Ian Whalley, James E. Hanson, Steve R. White, D. Chess, J. Kephart","doi":"10.1201/9781420009354.CH13","DOIUrl":"https://doi.org/10.1201/9781420009354.CH13","url":null,"abstract":"","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"137 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2006-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73374225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-15DOI: 10.1201/9781420009354.CH15
S. Rafaeli, R. Adams, D. Milojicic, Subu Iyer, P. Brett, V. Talwar
Modern computing environments, such as enterprise data centers, Grids, and PlanetLab, introduce distributed services to address scalability, locality, and reliability. Web Services (WS), in particular, improve decoupling, decentralization, and autonomicity within distributed systems. Unfortunately, scale and decentralization introduce additional problems in distributed services management, such as deployment, monitoring, and lifecycle maintenance. In this paper, we propose a new approach to management of large scale distributed services, based on three artifacts: scalable publish-subscribe eventing, scalable WS-based deployment, and model-based management. We demonstrate that these techniques improve the manageability of services. In this way we enable service developers to focus on the development of service functionality rather than on management features.
{"title":"Scalable Management — Technologies for Management of Large-Scale, Distributed Systems","authors":"S. Rafaeli, R. Adams, D. Milojicic, Subu Iyer, P. Brett, V. Talwar","doi":"10.1201/9781420009354.CH15","DOIUrl":"https://doi.org/10.1201/9781420009354.CH15","url":null,"abstract":"Modern computing environments, such as enterprise data centers, Grids, and PlanetLab, introduce distributed services to address scalability, locality, and reliability. Web Services (WS), in particular, improve decoupling, decentralization, and autonomicity within distributed systems. Unfortunately, scale and decentralization introduce additional problems in distributed services management, such as deployment, monitoring, and lifecycle maintenance. In this paper, we propose a new approach to management of large scale distributed services, based on three artifacts: scalable publish-subscribe eventing, scalable WS-based deployment, and model-based management. We demonstrate that these techniques improve the manageability of services. In this way we enable service developers to focus on the development of service functionality rather than on management features.","PeriodicalId":6643,"journal":{"name":"2015 IEEE International Conference on Autonomic Computing","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2006-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83107898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}