2010 29th IEEE Symposium on Reliable Distributed Systems最新文献

英文中文

Benchmarking the Resilience of Self-Adaptive Systems: A New Research Challenge 对自适应系统弹性进行基准测试:一个新的研究挑战

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.50

Raquel Almeida, H. Madeira, M. Vieira

Self-adaptive systems are widely recognized as the future of computer systems. Due to their dynamic and evolving nature, the characterization of self-adaptation and resilience attributes is of upmost importance. The problem is that nowadays there is no practical way to characterize self-adaptation capabilities or to compare alternative solutions concerning resilience. In this paper we discuss the problem of resilience benchmarking of self-adaptive systems. We start by identifying a set of key challenges and then propose a research roadmap to tackle those challenges.

自适应系统被广泛认为是计算机系统的未来。由于其动态和演化的特性，自我适应和恢复属性的表征是最重要的。问题是，现在没有实用的方法来描述自我适应能力，或者比较关于恢复力的替代解决方案。本文讨论了自适应系统的弹性基准问题。我们首先确定一系列关键挑战，然后提出解决这些挑战的研究路线图。

引用次数: 5

Towards Mobile Data Streaming in Service Oriented Architecture 面向服务架构的移动数据流研究

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.45

Norman Ahmed, M. Linderman, Jason Bryant

Service Oriented Architecture (SOA) is an architectural pattern providing agility to align technical solutions to modular business services that are decoupled from service consumers. Service capabilities such as interface options, quality of service (QoS), throughput, security and other constraints are described in the Service Level Agreement (SLA) that would typically be published in the service registry (UDDI) for use by consumers and/or mediation mechanisms. For mobile data streaming applications, problems arise when a service provider’s SLA attributes cannot be mapped one-to-one to the service consumers (i.e. 150MB/sec video stream service provider to 5MB/sec data consumer). In this paper we present a generic framework prototype for managing and disseminating streaming data within a SOA environment as an alternative to custom service implementations based upon specific consumers or data types. Based on this framework, we implemented a set of services: Stream Discovery Service, Stream Multiplexor / Demultiplexor(routing) Service, Stream Brokering Service, Stream Repository Service and Stream Filtering Service to demonstrate the flexibility of such a streaming data framework within SOA environment.

面向服务的体系结构(SOA)是一种体系结构模式，提供了将技术解决方案与与服务使用者解耦的模块化业务服务保持一致的灵活性。服务功能(如接口选项、服务质量(QoS)、吞吐量、安全性和其他约束)在服务水平协议(SLA)中描述，服务水平协议通常在服务注册中心(UDDI)中发布，供消费者和/或中介机制使用。对于移动数据流应用程序，当服务提供商的SLA属性不能一对一映射到服务消费者(即150MB/秒视频流服务提供商到5MB/秒数据消费者)时，问题就出现了。在本文中，我们提出了一个通用框架原型，用于在SOA环境中管理和传播流数据，作为基于特定消费者或数据类型的自定义服务实现的替代方案。基于这个框架，我们实现了一组服务:流发现服务、流复用/解复用(路由)服务、流代理服务、流存储服务和流过滤服务，以展示这种流数据框架在SOA环境中的灵活性。

{"title":"Towards Mobile Data Streaming in Service Oriented Architecture","authors":"Norman Ahmed, M. Linderman, Jason Bryant","doi":"10.1109/SRDS.2010.45","DOIUrl":"https://doi.org/10.1109/SRDS.2010.45","url":null,"abstract":"Service Oriented Architecture (SOA) is an architectural pattern providing agility to align technical solutions to modular business services that are decoupled from service consumers. Service capabilities such as interface options, quality of service (QoS), throughput, security and other constraints are described in the Service Level Agreement (SLA) that would typically be published in the service registry (UDDI) for use by consumers and/or mediation mechanisms. For mobile data streaming applications, problems arise when a service provider’s SLA attributes cannot be mapped one-to-one to the service consumers (i.e. 150MB/sec video stream service provider to 5MB/sec data consumer). In this paper we present a generic framework prototype for managing and disseminating streaming data within a SOA environment as an alternative to custom service implementations based upon specific consumers or data types. Based on this framework, we implemented a set of services: Stream Discovery Service, Stream Multiplexor / Demultiplexor(routing) Service, Stream Brokering Service, Stream Repository Service and Stream Filtering Service to demonstrate the flexibility of such a streaming data framework within SOA environment.","PeriodicalId":219204,"journal":{"name":"2010 29th IEEE Symposium on Reliable Distributed Systems","volume":"410 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125402049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Availability Assessment of HA Standby Redundant Clusters HA备冗余集群可用性评估

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.37

S. Distefano, F. Longo, M. Scarpa

Computing systems are becoming the heart of modern technology, implementing critical tasks usually demanded to and implying human interactions. This highlights the problem of dependability in computer science contexts. High availability computing/clusters is a possible solution in such cases, implementing standby redundancy as a trade-off between dependability and costs. From the engineering perspective, this implies the use of specific techniques and tools for adequately evaluating the reliability/availability of high availability clusters, also taking into account dependencies among nodes (standby, repair, etc.) and the effect of wear and tear into such nodes, especially when failure and repair times are not exponentially distributed. The solution proposed in this paper is based on the use of phase type distributions and Kronecker algebra. In fact, we represent the reliability and maintainability of each component by specific phase type distributions, whose interactions describe the system availability. This latter is thus modeled by an expanded Markov chain expressed in terms of Kronecker algebra in order to face the state space explosion problem of expansion techniques and to represent the memory policies related to the aging process. More specifically, the paper firstly details the technique and then applies it to the evaluation of a standby redundant system representing a high availability cluster taken as example with the aim of demonstrating its effectiveness. Moreover, in order to show the potentiality of the technique, different maintenance strategies are evaluated and therefore compared.

计算系统正在成为现代技术的核心，实现通常需要和暗示人类互动的关键任务。这突出了计算机科学背景下的可靠性问题。在这种情况下，高可用性计算/集群是一种可能的解决方案，实现备用冗余作为可靠性和成本之间的权衡。从工程的角度来看，这意味着使用特定的技术和工具来充分评估高可用性集群的可靠性/可用性，同时考虑节点之间的依赖关系(待机、维修等)以及对这些节点的磨损的影响，特别是当故障和维修时间不是指数分布的时候。本文提出的解决方案是基于相位类型分布和Kronecker代数的使用。实际上，我们通过特定的阶段类型分布来表示每个组件的可靠性和可维护性，它们的交互描述了系统的可用性。为了解决扩展技术的状态空间爆炸问题，并表示与老化过程相关的存储策略，将后者用Kronecker代数表示的扩展马尔可夫链进行建模。具体来说，本文首先详细介绍了该技术，然后以一个代表高可用性集群的备用冗余系统为例，验证了该技术的有效性。此外，为了展示该技术的潜力，对不同的维护策略进行了评估和比较。

{"title":"Availability Assessment of HA Standby Redundant Clusters","authors":"S. Distefano, F. Longo, M. Scarpa","doi":"10.1109/SRDS.2010.37","DOIUrl":"https://doi.org/10.1109/SRDS.2010.37","url":null,"abstract":"Computing systems are becoming the heart of modern technology, implementing critical tasks usually demanded to and implying human interactions. This highlights the problem of dependability in computer science contexts. High availability computing/clusters is a possible solution in such cases, implementing standby redundancy as a trade-off between dependability and costs. From the engineering perspective, this implies the use of specific techniques and tools for adequately evaluating the reliability/availability of high availability clusters, also taking into account dependencies among nodes (standby, repair, etc.) and the effect of wear and tear into such nodes, especially when failure and repair times are not exponentially distributed. The solution proposed in this paper is based on the use of phase type distributions and Kronecker algebra. In fact, we represent the reliability and maintainability of each component by specific phase type distributions, whose interactions describe the system availability. This latter is thus modeled by an expanded Markov chain expressed in terms of Kronecker algebra in order to face the state space explosion problem of expansion techniques and to represent the memory policies related to the aging process. More specifically, the paper firstly details the technique and then applies it to the evaluation of a standby redundant system representing a high availability cluster taken as example with the aim of demonstrating its effectiveness. Moreover, in order to show the potentiality of the technique, different maintenance strategies are evaluated and therefore compared.","PeriodicalId":219204,"journal":{"name":"2010 29th IEEE Symposium on Reliable Distributed Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124131796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Diskless Checkpointing with Rollback-Dependency Trackability 具有回滚依赖可跟踪性的无盘检查点

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.17

R. Menderico, Islene C. Garcia

One way to implement fault tolerant applications is storing its current state in stable memory and, when a failure occurs, restart the application from the last global consistent state. If the number of simultaneous failures is expected to be small a diskless check pointing approach can be used, where a failed process’s state can be determined only accessing non-faulty process’s memory. In the iterature diskless check pointing is usually based on synchronous protocols or properties of the application. In this paper we present a quasi-synchronous diskless check pointing algorithm, called RDT-Diskless, based on Rollback-Dependency Track ability. The proposed algorithm includes a garbage collection approach that limits the number of checkpoints that must be kept in memory. A framework, called Cheops, was developed and experimental results were obtained from a commercial cloud environment.

实现容错应用程序的一种方法是将其当前状态存储在稳定的内存中，当发生故障时，从最后的全局一致状态重新启动应用程序。如果预计同时发生的故障数量很少，则可以使用无磁盘检查指向方法，在这种方法中，可以仅访问非故障进程的内存来确定故障进程的状态。在文献中，无磁盘检查指向通常基于同步协议或应用程序的属性。本文提出了一种基于回滚依赖跟踪能力的准同步无磁盘校验点算法RDT-Diskless。提出的算法包括一种垃圾收集方法，该方法限制了必须保留在内存中的检查点的数量。开发了一个名为Cheops的框架，并从商业云环境中获得了实验结果。

引用次数: 1

FireSpam: Spam Resilient Gossiping in the BAR Model firerespam: BAR模型中的垃圾邮件弹性八卦

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.33

Sonia Ben Mokhtar, Alessio Pace, Vivien Quéma

Gossip protocols are an efficient and reliable way to disseminate information. These protocols have nevertheless a drawback: they are unable to limit the dissemination of spam messages. Indeed, messages are redundantly disseminated in the network and it is enough that a small subset of nodes forward spam messages to have them received by a majority of nodes. In this paper, we present Fire Spam, a gossiping protocol that is able to limit spam dissemination. Fire Spam organizes nodes in a ladder topology, where nodes highly capable of filtering spam are at the top of the ladder, whereas nodes with a low spam filtering capability are at the bottom of the ladder. Messages are disseminated from the bottom of the ladder to its top. The ladder does thus act as a progressive spam filter. In order to make it usable in practice, we designed Fire Spam in the BAR model. This model takes into account selfish and malicious behaviors. We evaluate Fire Spam using simulations. We show that it drastically limits the dissemination of spam messages, while still ensuring reliable dissemination of good messages.

八卦协议是一种高效可靠的信息传播方式。然而，这些协议有一个缺点:它们无法限制垃圾邮件的传播。实际上，消息在网络中被冗余地传播，并且一小部分节点转发垃圾消息以使大多数节点接收它们就足够了。在本文中，我们提出了Fire Spam，一个能够限制垃圾邮件传播的八卦协议。Fire Spam在阶梯拓扑中组织节点，其中过滤垃圾邮件能力强的节点位于梯子的顶部，而过滤垃圾邮件能力低的节点位于梯子的底部。信息从梯子的底部传播到顶部。因此，梯子充当了一个渐进的垃圾邮件过滤器。为了使其能够在实际应用中使用，我们在BAR模型中设计了火力垃圾邮件。这个模型考虑了自私和恶意的行为。我们使用模拟来评估Fire Spam。我们表明，它极大地限制了垃圾邮件的传播，同时仍然确保可靠的传播良好的消息。

引用次数: 23

Swift Algorithms for Repeated Consensus 重复共识的快速算法

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.18

Fatemeh Borran, Martin Hutle, Nuno Santos, A. Schiper

We introduce the notion of a swift algorithm. Informally, an algorithm that solves the repeated consensus is swift if, in a partial synchronous run of this algorithm, eventually no timeout expires, i.e., the algorithm execution proceeds with the actual speed of the system. This definition differs from other efficiency criteria for partial synchronous systems. Furthermore, we show that the notion of swiftness explains why failure detector based algorithms are typically more efficient than round-based algorithms, since the former are naturally swift while the latter are naturally non-swift. We show that this is not an inherent difference between the models, and provide a round implementation that is swift, therefore performing similarly to failure detector algorithms while maintaining the advantages of the round model.

我们引入了快速算法的概念。非正式地说，解决重复共识的算法是快速的，如果在该算法的部分同步运行中，最终没有超时，即算法执行以系统的实际速度进行。这个定义不同于部分同步系统的其他效率标准。此外，我们证明了快速的概念解释了为什么基于故障检测器的算法通常比基于轮的算法更有效，因为前者自然是快速的，而后者自然是不快速的。我们证明了这不是模型之间的固有差异，并提供了一个快速的轮实现，因此在保持轮模型的优点的同时执行类似于故障检测器算法。

引用次数: 1

Applying Text Classification Algorithms in Web Services Robustness Testing 文本分类算法在Web服务鲁棒性测试中的应用

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.36

N. Laranjeiro, R. Oliveira, M. Vieira

Testing web services for robustness is an effective way of disclosing software bugs. However, when executing robustness tests, a very large amount of service responses has to be manually classified to distinguish regular responses from responses that indicate robustness problems. Besides requiring a large amount of time and effort, this complex classification process can easily lead to errors resulting from the human intervention in such a laborious task. Text classification algorithms have been applied successfully in many contexts (e.g., spam identification, text categorization, etc) and are considered a powerful tool for the successful automation of several classification-based tasks. In this paper we present a study on the applicability of five widely used text classification algorithms in the context of web services robustness testing. In practice, we assess the effectiveness of Support Vector Machines, Naïve Bayes, Large Linear Classification, K-nearest neighbor (Ibk), and Hyperpipes in classifying web services responses. Results indicate that these algorithms can be effectively used to automate the identification of robustness issues while reducing human intervention. However, in all mechanisms there are cases of misclassified responses, which means that there is space for improvement.

测试web服务的健壮性是发现软件缺陷的有效方法。但是，在执行健壮性测试时，必须手动对大量服务响应进行分类，以区分常规响应和指示健壮性问题的响应。除了需要大量的时间和精力之外，这种复杂的分类过程很容易由于人工干预而导致错误。文本分类算法已经成功地应用于许多环境中(例如，垃圾邮件识别，文本分类等)，并且被认为是一些基于分类的任务成功自动化的强大工具。本文研究了五种广泛使用的文本分类算法在web服务鲁棒性测试中的适用性。在实践中，我们评估了支持向量机(Support Vector Machines)、Naïve贝叶斯(Bayes)、大线性分类(Large Linear Classification)、k近邻(K-nearest neighbor, Ibk)和Hyperpipes对web服务响应进行分类的有效性。结果表明，这些算法可以有效地用于鲁棒性问题的自动识别，同时减少人为干预。然而，在所有机制中都存在错误分类反应的情况，这意味着存在改进的空间。

{"title":"Applying Text Classification Algorithms in Web Services Robustness Testing","authors":"N. Laranjeiro, R. Oliveira, M. Vieira","doi":"10.1109/SRDS.2010.36","DOIUrl":"https://doi.org/10.1109/SRDS.2010.36","url":null,"abstract":"Testing web services for robustness is an effective way of disclosing software bugs. However, when executing robustness tests, a very large amount of service responses has to be manually classified to distinguish regular responses from responses that indicate robustness problems. Besides requiring a large amount of time and effort, this complex classification process can easily lead to errors resulting from the human intervention in such a laborious task. Text classification algorithms have been applied successfully in many contexts (e.g., spam identification, text categorization, etc) and are considered a powerful tool for the successful automation of several classification-based tasks. In this paper we present a study on the applicability of five widely used text classification algorithms in the context of web services robustness testing. In practice, we assess the effectiveness of Support Vector Machines, Naïve Bayes, Large Linear Classification, K-nearest neighbor (Ibk), and Hyperpipes in classifying web services responses. Results indicate that these algorithms can be effectively used to automate the identification of robustness issues while reducing human intervention. However, in all mechanisms there are cases of misclassified responses, which means that there is space for improvement.","PeriodicalId":219204,"journal":{"name":"2010 29th IEEE Symposium on Reliable Distributed Systems","volume":"55 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124865672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Data Validity and Dependable Perception in Networked Sensor-Based Systems 基于网络传感器系统的数据有效性和可靠感知

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.52

Luis Marques, A. Casimiro

Although the technology and applications of wireless sensor networks have greatly increased over the last years, ensuring a dependable real-time operation despite faults and temporal uncertainties is still an on-going research topic. The problems are particularly significant when considering that future applications will interact with their environment not only for supervision or monitoring, but also to directly control physical (real-time) entities, sometimes with safety-critical requirements. We believe that reasoning in terms of data validity might be a good way to approach the problem. The ability to know if sensor data flowing in the system is valid – data validity awareness –, is a first step to achieve a dependable operation. But more than that, it should be possible to ensure, given requirements for data validity throughout the operation, a dependable perception of the environment. In this paper we essentially discuss the problem, analyzing some of the issues that need to be addressed to achieve these goals. Particularly, we introduce fundamental concepts and relevant definitions, we elaborate on the main impediments to achieve data validity awareness and describe relevant means to deal with these impediments. Finally, we address the issue of ensuring a dependable perception and present some research ideas in this direction.

尽管无线传感器网络的技术和应用在过去几年中有了很大的发展，但在故障和时间不确定性的情况下确保可靠的实时运行仍然是一个正在进行的研究课题。考虑到未来的应用程序将与其环境交互，不仅用于监督或监视，而且还用于直接控制物理(实时)实体，有时具有安全关键要求，因此这些问题尤为重要。我们认为，从数据有效性的角度进行推理可能是解决这个问题的好方法。了解在系统中流动的传感器数据是否有效的能力——数据有效性感知——是实现可靠运行的第一步。但更重要的是，考虑到整个操作过程中对数据有效性的要求，应该有可能确保对环境的可靠感知。在本文中，我们主要讨论了这个问题，分析了实现这些目标需要解决的一些问题。特别是，我们介绍了基本概念和相关定义，阐述了实现数据有效性感知的主要障碍，并描述了处理这些障碍的相关方法。最后，我们讨论了确保可靠感知的问题，并在这个方向上提出了一些研究思路。

{"title":"Data Validity and Dependable Perception in Networked Sensor-Based Systems","authors":"Luis Marques, A. Casimiro","doi":"10.1109/SRDS.2010.52","DOIUrl":"https://doi.org/10.1109/SRDS.2010.52","url":null,"abstract":"Although the technology and applications of wireless sensor networks have greatly increased over the last years, ensuring a dependable real-time operation despite faults and temporal uncertainties is still an on-going research topic. The problems are particularly significant when considering that future applications will interact with their environment not only for supervision or monitoring, but also to directly control physical (real-time) entities, sometimes with safety-critical requirements. We believe that reasoning in terms of data validity might be a good way to approach the problem. The ability to know if sensor data flowing in the system is valid – data validity awareness –, is a first step to achieve a dependable operation. But more than that, it should be possible to ensure, given requirements for data validity throughout the operation, a dependable perception of the environment. In this paper we essentially discuss the problem, analyzing some of the issues that need to be addressed to achieve these goals. Particularly, we introduce fundamental concepts and relevant definitions, we elaborate on the main impediments to achieve data validity awareness and describe relevant means to deal with these impediments. Finally, we address the issue of ensuring a dependable perception and present some research ideas in this direction.","PeriodicalId":219204,"journal":{"name":"2010 29th IEEE Symposium on Reliable Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130113483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Cooperative Sampling Approach to Discovering Optimal Configurations in Large Scale Computing Systems 大规模计算系统中发现最优配置的合作抽样方法

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.21

Haifeng Chen, Guofei Jiang, Hui Zhang, K. Yoshihira

With the growing scale of current computing systems, traditional configuration tuning methods become less effective because they usually assume a small number of parameters in the system. In order to handle the scalability issue of configuration tuning, this paper proposes a cooperative optimization framework, which mimics the behavior of team playing to discover the optimal configuration setting in computing systems. We follow a ‘best of the best’ rule to decompose the tuning task into a number of small subtasks with manageable size and complexity. While each decomposed module is responsible for the optimization of its own configuration parameters, all the modules share the performance evaluations of new samples as common feedbacks to enhance their optimization objectives. As a result, the qualities of generated samples become improved during the search, and the cooperative sampling will eventually discover the optimal configurations in the system. Experimental results demonstrate that our proposed cooperative optimization can identify better solutions within limited time periods compared with other state of the art configuration search methods. Such advantage becomes more significant when the number of configuration parameters increases.

随着当前计算系统规模的不断扩大，传统的配置调优方法由于在系统中通常只假定少量的参数而变得不那么有效。为了解决配置调优的可扩展性问题，本文提出了一种协作优化框架，该框架模拟计算系统中的团队游戏行为来发现最优配置设置。我们遵循“最佳中的最佳”规则，将调优任务分解为许多具有可管理的大小和复杂性的小子任务。虽然每个分解模块负责优化自己的配置参数，但所有模块共享新样本的性能评估作为共同反馈，以增强其优化目标。因此，在搜索过程中，生成的样本质量得到了提高，合作采样最终会发现系统中的最优配置。实验结果表明，与其他最先进的配置搜索方法相比，我们提出的协同优化方法可以在有限的时间内识别出更好的解决方案。当配置参数的数量增加时，这种优势变得更加显著。

{"title":"A Cooperative Sampling Approach to Discovering Optimal Configurations in Large Scale Computing Systems","authors":"Haifeng Chen, Guofei Jiang, Hui Zhang, K. Yoshihira","doi":"10.1109/SRDS.2010.21","DOIUrl":"https://doi.org/10.1109/SRDS.2010.21","url":null,"abstract":"With the growing scale of current computing systems, traditional configuration tuning methods become less effective because they usually assume a small number of parameters in the system. In order to handle the scalability issue of configuration tuning, this paper proposes a cooperative optimization framework, which mimics the behavior of team playing to discover the optimal configuration setting in computing systems. We follow a ‘best of the best’ rule to decompose the tuning task into a number of small subtasks with manageable size and complexity. While each decomposed module is responsible for the optimization of its own configuration parameters, all the modules share the performance evaluations of new samples as common feedbacks to enhance their optimization objectives. As a result, the qualities of generated samples become improved during the search, and the cooperative sampling will eventually discover the optimal configurations in the system. Experimental results demonstrate that our proposed cooperative optimization can identify better solutions within limited time periods compared with other state of the art configuration search methods. Such advantage becomes more significant when the number of configuration parameters increases.","PeriodicalId":219204,"journal":{"name":"2010 29th IEEE Symposium on Reliable Distributed Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Experimental Validation of a Synchronization Uncertainty-Aware Software Clock 同步不确定性感知软件时钟的实验验证

2010 29th IEEE Symposium on Reliable Distributed Systems

Pub Date : 2010-10-31 DOI: 10.1109/SRDS.2010.35

A. Bondavalli, F. Brancati, A. Ceccarelli, M. Vadursi

A software clock capable of self-evaluating its synchronization uncertainty is experimentally validated for a specific implementation on a node synchronized through NTP. The validation methodology takes advantage of an external node equipped with a GPS-synchronized clock acting as a reference, which is connected to the node hosting the system under test through a fast Ethernet connection. Experiments are carried out for different values of the software clock parameters and different types of workload, and address the possible occurrence of faults in the system under test and in the NTP synchronization mechanism. The validation methodology is designed to be as less intrusive as possible and to grant a resolution of the order of few hundreds of microseconds. The experimental results show very good performance of R&SAClock, and their analysis gives precious hints for further improvements.

一个软件时钟能够自我评估其同步不确定性实验验证了一个特定的实现上的节点通过NTP同步。验证方法利用配备gps同步时钟作为参考的外部节点，该节点通过快速以太网连接到承载被测系统的节点。针对不同的软件时钟参数值和不同类型的工作负载进行实验，解决被测系统和NTP同步机制中可能出现的故障。验证方法被设计为尽可能减少干扰，并提供几百微秒的分辨率。实验结果表明R&SAClock具有良好的性能，其分析为进一步改进提供了宝贵的提示。

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2010 29th IEEE Symposium on Reliable Distributed Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀