ACM Transactions on Computer Systems最新文献_第9页

DieCast: Testing Distributed Systems with an Accurate Scale Model DieCast:用精确的比例模型测试分布式系统

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2008-04-16 DOI: 10.1145/1963559.1963560

Diwaker Gupta, K. Vishwanath, Amin Vahdat

Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior at such scales. Testing large services should ideally be done at the same scale and configuration as the target deployment, which can be technically and economically infeasible. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines across a much smaller number of physical machines in a test harness. We show how to accurately scale CPU, network, and disk to provide the illusion that each VM matches a machine in the original service in terms of both available computing resources and communication behavior. We present the architecture and evaluation of a system we built to support such experimentation and discuss its limitations. We show that for a variety of services---including a commercial high-performance cluster-based file system---and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources.

大规模的网络服务可以由数以万计的机器组成，这些机器运行着分布在数百个物理网络中的数千种独特的软件配置。测试这些服务的复杂性能问题和配置错误仍然是一个难题。现有的测试技术，例如模拟或运行较小的服务实例，在预测这种规模的整体服务行为方面存在局限性。理想情况下，应该在与目标部署相同的规模和配置下测试大型服务，这在技术上和经济上都是不可行的。我们介绍DieCast，这是一种扩展网络服务的方法，在这种方法中，我们将给定服务配置中的所有节点作为虚拟机多路复用，跨测试工具中数量少得多的物理机器。我们将展示如何精确地缩放CPU、网络和磁盘，以提供一种假象，即每个VM在可用计算资源和通信行为方面都与原始服务中的一台机器相匹配。我们展示了我们为支持这种实验而建立的系统的架构和评估，并讨论了它的局限性。我们展示了对于各种服务(包括基于商业高性能集群的文件系统)和资源利用级别，DieCast在使用一小部分物理资源的情况下匹配原始服务的行为。

{"title":"DieCast: Testing Distributed Systems with an Accurate Scale Model","authors":"Diwaker Gupta, K. Vishwanath, Amin Vahdat","doi":"10.1145/1963559.1963560","DOIUrl":"https://doi.org/10.1145/1963559.1963560","url":null,"abstract":"Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior at such scales.\u0000 Testing large services should ideally be done at the same scale and configuration as the target deployment, which can be technically and economically infeasible. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines across a much smaller number of physical machines in a test harness. We show how to accurately scale CPU, network, and disk to provide the illusion that each VM matches a machine in the original service in terms of both available computing resources and communication behavior. We present the architecture and evaluation of a system we built to support such experimentation and discuss its limitations. We show that for a variety of services---including a commercial high-performance cluster-based file system---and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"4:1-4:48"},"PeriodicalIF":1.5,"publicationDate":"2008-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88284367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 127

High-bandwidth data dissemination for large-scale distributed systems 大规模分布式系统的高带宽数据传播

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2008-02-01 DOI: 10.1145/1328671.1328674

Dejan Kostic, A. Snoeren, Amin Vahdat, R. Braud, C. Killian, James W. Anderson, Jeannie R. Albrecht, Adolfo Rodriguez, Erik Vandekieft

This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree. To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the right rate from all peers (flow control), and (v) recovering from failures and adapting to dynamically changing network conditions. Additionally, the system should be self-adjusting and should have few user-adjustable parameter settings. We describe our approach to addressing all of these problems in a working, deployed system across the Internet. Bullet outperforms state-of-the-art systems, including BitTorrent, by 25-70% and exhibits strong performance and reliability in a range of deployment settings. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.

本文主要研究多接收机数据分发问题。最初，IP多播形成了有效支持这种分发的基础。最近出现了支持点对多点通信的覆盖网络。这两种技术都侧重于构建根植于源的树，以便在所有感兴趣的接收者之间分发内容。然而，我们认为树在数据传播方面有两个基本限制。首先，由于所有数据都来自单一父节点，因此参与者必须经常不断地搜索具有可接受带宽水平的父节点。其次，由于数据包丢失和故障，可用带宽沿树单调递减。为了解决这些限制，我们提出了Bullet，这是一种数据传播网格，它利用终端主机的计算和存储能力来创建一个分布结构，其中节点从多个对等点并行接收数据。为了使网格提供更好的带宽和可靠性，我们需要解决几个关键问题:(i)在网格上传播不相交的数据，(ii)定位丢失的内容，(iii)找到与谁对等(对等策略)，(iv)从所有对等点以正确的速率检索数据(流量控制)，以及(v)从故障中恢复并适应动态变化的网络条件。此外，系统应该是自我调节的，应该有很少的用户可调参数设置。我们描述了在Internet上可工作的部署系统中解决所有这些问题的方法。Bullet的性能比最先进的系统(包括BitTorrent)高出25-70%，并且在一系列部署设置中表现出强大的性能和可靠性。此外，我们发现，相对于基于树的解决方案，Bullet减少了执行昂贵的带宽探测的需要。

{"title":"High-bandwidth data dissemination for large-scale distributed systems","authors":"Dejan Kostic, A. Snoeren, Amin Vahdat, R. Braud, C. Killian, James W. Anderson, Jeannie R. Albrecht, Adolfo Rodriguez, Erik Vandekieft","doi":"10.1145/1328671.1328674","DOIUrl":"https://doi.org/10.1145/1328671.1328674","url":null,"abstract":"This article focuses on the multireceiver data dissemination problem. Initially, IP multicast formed the basis for efficiently supporting such distribution. More recently, overlay networks have emerged to support point-to-multipoint communication. Both techniques focus on constructing trees rooted at the source to distribute content among all interested receivers. We argue, however, that trees have two fundamental limitations for data dissemination. First, since all data comes from a single parent, participants must often continuously probe in search of a parent with an acceptable level of bandwidth. Second, due to packet losses and failures, available bandwidth is monotonically decreasing down the tree.\u0000 To address these limitations, we present Bullet, a data dissemination mesh that takes advantage of the computational and storage capabilities of end hosts to create a distribution structure where a node receives data in parallel from multiple peers. For the mesh to deliver improved bandwidth and reliability, we need to solve several key problems: (i) disseminating disjoint data over the mesh, (ii) locating missing content, (iii) finding who to peer with (peering strategy), (iv) retrieving data at the right rate from all peers (flow control), and (v) recovering from failures and adapting to dynamically changing network conditions. Additionally, the system should be self-adjusting and should have few user-adjustable parameter settings. We describe our approach to addressing all of these problems in a working, deployed system across the Internet. Bullet outperforms state-of-the-art systems, including BitTorrent, by 25-70% and exhibits strong performance and reliability in a range of deployment settings. In addition, we find that, relative to tree-based solutions, Bullet reduces the need to perform expensive bandwidth probing.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"118 1","pages":"3:1-3:61"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77417557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

A generic component model for building systems software 用于构建系统软件的通用组件模型

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2008-02-01 DOI: 10.1145/1328671.1328672

G. Coulson, G. Blair, P. Grace, François Taïani, Ackbar Joolia, Kevin Lee, J. Ueyama, Thirunavukkarasu Sivaharan

Component-based software structuring principles are now commonplace at the application level; but componentization is far less established when it comes to building low-level systems software. Although there have been pioneering efforts in applying componentization to systems-building, these efforts have tended to target specific application domains (e.g., embedded systems, operating systems, communications systems, programmable networking environments, or middleware platforms). They also tend to be targeted at specific deployment environments (e.g., standard personal computer (PC) environments, network processors, or microcontrollers). The disadvantage of this narrow targeting is that it fails to maximize the genericity and abstraction potential of the component approach. In this article, we argue for the benefits and feasibility of a generic yet tailorable approach to component-based systems-building that offers a uniform programming model that is applicable in a wide range of systems-oriented target domains and deployment environments. The component model, called OpenCom, is supported by a reflective runtime architecture that is itself built from components. After describing OpenCom and evaluating its performance and overhead characteristics, we present and evaluate two case studies of systems we have built using OpenCom technology, thus illustrating its benefits and its general applicability.

基于组件的软件结构原则现在在应用程序级别很常见;但是，在构建低级系统软件时，组件化还远远没有建立起来。尽管在将组件化应用于系统构建方面已经有了开创性的努力，但这些努力倾向于针对特定的应用领域(例如，嵌入式系统、操作系统、通信系统、可编程网络环境或中间件平台)。它们还倾向于针对特定的部署环境(例如，标准个人计算机(PC)环境、网络处理器或微控制器)。这种狭隘目标的缺点是，它不能最大限度地发挥组件方法的泛型和抽象潜力。在本文中，我们论证了一种通用但可定制的基于组件的系统构建方法的好处和可行性，该方法提供了一种统一的编程模型，该模型适用于广泛的面向系统的目标领域和部署环境。组件模型称为OpenCom，由一个反射运行时体系结构支持，该体系结构本身就是由组件构建的。在描述了OpenCom并评估了它的性能和开销特性之后，我们提出并评估了我们使用OpenCom技术构建的两个系统案例研究，从而说明了它的好处和它的一般适用性。

{"title":"A generic component model for building systems software","authors":"G. Coulson, G. Blair, P. Grace, François Taïani, Ackbar Joolia, Kevin Lee, J. Ueyama, Thirunavukkarasu Sivaharan","doi":"10.1145/1328671.1328672","DOIUrl":"https://doi.org/10.1145/1328671.1328672","url":null,"abstract":"Component-based software structuring principles are now commonplace at the application level; but componentization is far less established when it comes to building low-level systems software. Although there have been pioneering efforts in applying componentization to systems-building, these efforts have tended to target specific application domains (e.g., embedded systems, operating systems, communications systems, programmable networking environments, or middleware platforms). They also tend to be targeted at specific deployment environments (e.g., standard personal computer (PC) environments, network processors, or microcontrollers). The disadvantage of this narrow targeting is that it fails to maximize the genericity and abstraction potential of the component approach. In this article, we argue for the benefits and feasibility of a generic yet tailorable approach to component-based systems-building that offers a uniform programming model that is applicable in a wide range of systems-oriented target domains and deployment environments. The component model, called OpenCom, is supported by a reflective runtime architecture that is itself built from components. After describing OpenCom and evaluating its performance and overhead characteristics, we present and evaluate two case studies of systems we have built using OpenCom technology, thus illustrating its benefits and its general applicability.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"41 1","pages":"1:1-1:42"},"PeriodicalIF":1.5,"publicationDate":"2008-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73316466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 407

Incrementally parallelizing database transactions with thread-level speculation 使用线程级推测增量并行化数据库事务

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2008-02-01 DOI: 10.1145/1328671.1328673

Christopher B. Colohan, A. Ailamaki, J. Steffan, T. Mowry

With the advent of chip multiprocessors, exploiting intratransaction parallelism in database systems is an attractive way of improving transaction performance. However, exploiting intratransaction parallelism is difficult for two reasons: first, significant changes are required to avoid races or conflicts within the DBMS; and second, adding threads to transactions requires a high level of sophistication from transaction programmers. In this article we show how dividing a transaction into speculative threads solves both problems—it minimizes the changes required to the DBMS, and the details of parallelization are hidden from the transaction programmer. Our technique requires a limited number of small, localized changes to a subset of the low-level data structures in the DBMS. Through this method of incrementally parallelizing transactions, we can dramatically improve performance: on a simulated four-processor chip-multiprocessor, we improve the response time by 44--66% for three of the five TPC-C transactions, assuming the availability of idle processors.

随着芯片多处理器的出现，在数据库系统中利用事务内并行性是提高事务性能的一种有吸引力的方法。然而，由于两个原因，利用事务内并行性是困难的:首先，需要进行重大更改以避免DBMS内的竞争或冲突;其次，向事务中添加线程需要事务程序员具有高度的复杂性。在本文中，我们将展示如何将事务划分为推测线程来解决这两个问题——它最大限度地减少了对DBMS所需的更改，并且对事务程序员隐藏了并行化的细节。我们的技术只需要对DBMS中的底层数据结构子集进行数量有限的局部小更改。通过这种增量并行化事务的方法，我们可以显著提高性能:在模拟的四处理器芯片多处理器上，假设空闲处理器的可用性，我们将五个TPC-C事务中的三个事务的响应时间提高了44%—66%。

引用次数: 9

Memory scheduling for modern microprocessors 现代微处理器的内存调度

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2007-12-01 DOI: 10.1145/1314299.1314301

I. Hur, Calvin Lin

The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.

随着内存性能对整体系统性能变得越来越重要，仔细安排内存操作的需求也在增加。本文描述了基于自适应历史的调度程序(AHB)，它使用最近调度操作的历史来提供三个概念上的好处:(1)它允许调度器更好地解释与其调度决策相关的延迟，(2)它提供了一种结合多个约束的机制，这对日益复杂的DRAM结构很重要，(3)它允许调度器选择操作，以便它们匹配程序的读和写混合，从而避免内存控制器中的某些瓶颈。我们之前已经在IBM Power5上下文中评估了这个调度器。与现有的调度器相比，该调度器在流、NAS和商业基准测试中分别提高了15.6%、9.9%和7.6%的性能。本文以多种方式扩展了我们对AHB调度器的理解。回顾过去，我们在先前的工作上下文中描述了调度程序，这些工作只关注于避免银行冲突，并且我们表明AHB调度程序更适合IBM Power5，我们认为它将成为未来微处理器内存控制器的代表。展望未来，我们将通过改变许多微架构特性和硬件参数，在未来系统的上下文中评估这个调度器。例如，我们展示了当我们迁移到多线程环境时，这个调度器的好处会增加。

{"title":"Memory scheduling for modern microprocessors","authors":"I. Hur, Calvin Lin","doi":"10.1145/1314299.1314301","DOIUrl":"https://doi.org/10.1145/1314299.1314301","url":null,"abstract":"The need to carefully schedule memory operations has increased as memory performance has become increasingly important to overall system performance. This article describes the adaptive history-based (AHB) scheduler, which uses the history of recently scheduled operations to provide three conceptual benefits: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, (2) it provides a mechanism for combining multiple constraints, which is important for increasingly complex DRAM structures, and (3) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller.\u0000 We have previously evaluated this scheduler in the context of the IBM Power5. When compared with the state of the art, this scheduler improves performance by 15.6%, 9.9%, and 7.6% for the Stream, NAS, and commercial benchmarks, respectively. This article expands our understanding of the AHB scheduler in a variety of ways. Looking backwards, we describe the scheduler in the context of prior work that focused exclusively on avoiding bank conflicts, and we show that the AHB scheduler is superior for the IBM Power5, which we argue will be representative of future microprocessor memory controllers. Looking forwards, we evaluate this scheduler in the context of future systems by varying a number of microarchitectural features and hardware parameters. For example, we show that the benefit of this scheduler increases as we move to multithreaded environments.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"122 2 1","pages":"10"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88771820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Minimizing expected energy consumption in real-time systems through dynamic voltage scaling 通过动态电压缩放最小化实时系统的预期能耗

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2007-12-01 DOI: 10.1145/1314299.1314300

Ruibin Xu, D. Mossé, R. Melhem

Many real-time systems, such as battery-operated embedded devices, are energy constrained. A common problem for these systems is how to reduce energy consumption in the system as much as possible while still meeting the deadlines; a commonly used power management mechanism by these systems is dynamic voltage scaling (DVS). Usually, the workloads executed by these systems are variable and, more often than not, unpredictable. Because of the unpredictability of the workloads, one cannot guarantee to minimize the energy consumption in the system. However, if the variability of the workloads can be captured by the probability distribution of the computational requirement of each task in the system, it is possible to achieve the goal of minimizing the expected energy consumption in the system. In this paper, we investigate DVS schemes that aim at minimizing expected energy consumption for frame-based hard real-time systems. Our investigation considers various DVS strategies (i.e., intra-task DVS, inter-task DVS, and hybrid DVS) and both an ideal system model (i.e., assuming unrestricted continuous frequency, well-defined power-frequency relation, and no speed change overhead) and a realistic system model (i.e., the processor provides a set of discrete speeds, no assumption is made on power-frequency relation, and speed change overhead is considered). The highlights of the investigation are two practical DVS schemes: Practical PACE (PPACE) for a single task and Practical Inter-Task DVS (PITDVS2) for general frame-based systems. Evaluation results show that our proposed schemes outperform and achieve significant energy savings over existing schemes.

许多实时系统，如电池供电的嵌入式设备，都是能量有限的。这些系统的一个共同问题是如何在满足最后期限的同时尽可能地减少系统中的能源消耗;这些系统常用的电源管理机制是动态电压缩放(DVS)。通常，这些系统执行的工作负载是可变的，而且往往是不可预测的。由于工作负载的不可预测性，不能保证最小化系统中的能耗。然而，如果工作负载的可变性可以通过系统中每个任务的计算需求的概率分布来捕获，那么就有可能实现最小化系统中预期能耗的目标。在本文中，我们研究了旨在最小化基于帧的硬实时系统的预期能耗的分布式交换机方案。我们的研究考虑了各种分布式交换机策略(即任务内分布式交换机、任务间分布式交换机和混合分布式交换机)，以及理想系统模型(即假设不受限制的连续频率、定义良好的工频关系和无速度变化开销)和现实系统模型(即处理器提供一组离散速度，不假设工频关系，并考虑速度变化开销)。研究的重点是两种实用的分布式交换机方案:用于单个任务的实用PACE (PPACE)和用于一般基于框架的系统的实用任务间分布式交换机(PITDVS2)。评估结果显示，我们建议的方案优于现有方案，并显著节省能源。

{"title":"Minimizing expected energy consumption in real-time systems through dynamic voltage scaling","authors":"Ruibin Xu, D. Mossé, R. Melhem","doi":"10.1145/1314299.1314300","DOIUrl":"https://doi.org/10.1145/1314299.1314300","url":null,"abstract":"Many real-time systems, such as battery-operated embedded devices, are energy constrained. A common problem for these systems is how to reduce energy consumption in the system as much as possible while still meeting the deadlines; a commonly used power management mechanism by these systems is dynamic voltage scaling (DVS). Usually, the workloads executed by these systems are variable and, more often than not, unpredictable. Because of the unpredictability of the workloads, one cannot guarantee to minimize the energy consumption in the system. However, if the variability of the workloads can be captured by the probability distribution of the computational requirement of each task in the system, it is possible to achieve the goal of minimizing the expected energy consumption in the system. In this paper, we investigate DVS schemes that aim at minimizing expected energy consumption for frame-based hard real-time systems. Our investigation considers various DVS strategies (i.e., intra-task DVS, inter-task DVS, and hybrid DVS) and both an ideal system model (i.e., assuming unrestricted continuous frequency, well-defined power-frequency relation, and no speed change overhead) and a realistic system model (i.e., the processor provides a set of discrete speeds, no assumption is made on power-frequency relation, and speed change overhead is considered). The highlights of the investigation are two practical DVS schemes: Practical PACE (PPACE) for a single task and Practical Inter-Task DVS (PITDVS2) for general frame-based systems. Evaluation results show that our proposed schemes outperform and achieve significant energy savings over existing schemes.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"9"},"PeriodicalIF":1.5,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90835582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

Labels and event processes in the asbestos operating system 石棉操作系统中的标签和事件处理

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2007-12-01 DOI: 10.1145/1095810.1095813

P. Efstathopoulos, M. Krohn, Steve Vandebogart, C. Frey, David Ziegler, E. Kohler, David Mazières, F. Kaashoek, R. Morris

Asbestos, a new prototype operating system, provides novel labeling and isolation mechanisms that help contain the effects of exploitable software flaws. Applications can express a wide range of policies with Asbestos's kernel-enforced label mechanism, including controls on inter-process communication and system-wide information flow. A new event process abstraction provides lightweight, isolated contexts within a single process, allowing the same process to act on behalf of multiple users while preventing it from leaking any single user's data to any other user. A Web server that uses Asbestos labels to isolate user data requires about 1.5 memory pages per user, demonstrating that additional security can come at an acceptable cost.

石棉，一个新的原型操作系统，提供了新的标签和隔离机制，有助于遏制可利用的软件缺陷的影响。应用程序可以使用石棉的内核强制标签机制来表达广泛的策略，包括对进程间通信和系统范围信息流的控制。新的事件流程抽象在单个流程中提供了轻量级、隔离的上下文，允许同一流程代表多个用户进行操作，同时防止将任何单个用户的数据泄露给任何其他用户。使用石棉标签隔离用户数据的Web服务器每个用户大约需要1.5个内存页面，这表明额外的安全性可以以可接受的成本获得。

引用次数: 118

Zyzzyva: speculative byzantine fault tolerance Zyzzyva:投机拜占庭容错

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294267

R. Kotla, L. Alvisi, M. Dahlin, Allen Clement, Edmund L. Wong

We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client's request without first running an expensive three-phase commit protocol to reach agreement on the order in which the request must be processed. Instead, they optimistically adopt the order proposed by the primary and respond immediately to the client. Replicas can thus become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minimal.

我们提出了Zyzzyva协议，该协议使用推测来降低成本并简化拜占庭容错状态机复制的设计。在Zyzzyva中，副本响应客户端的请求，而无需首先运行昂贵的三阶段提交协议来就必须处理请求的顺序达成一致。相反，他们乐观地接受了主要客户提出的订单，并立即对客户做出回应。因此，副本可能会暂时变得彼此不一致，但客户端会检测到不一致，帮助纠正副本收敛于单个请求的总顺序，并且只依赖于与该总顺序一致的响应。这种方法允许Zyzzyva将复制开销降低到接近其理论最小值。

引用次数: 991

Rx: Treating bugs as allergies—a safe method to survive software failures Rx:把bug当作过敏症来对待——这是在软件故障中生存下来的一种安全方法

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2007-08-01 DOI: 10.1145/1275517.1275519

Feng Qin, Joseph A. Tucek, Yuanyuan Zhou, Jagadeesan Sundaresan

Many applications demand availability. Unfortunately, software failures greatly reduce system availability. Prior work on surviving software failures suffers from one or more of the following limitations: required application restructuring, inability to address deterministic software bugs, unsafe speculation on program execution, and long recovery time. This paper proposes an innovative safe technique, called Rx, which can quickly recover programs from many types of software bugs, both deterministic and nondeterministic. Our idea, inspired from allergy treatment in real life, is to rollback the program to a recent checkpoint upon a software failure, and then to reexecute the program in a modified environment. We base this idea on the observation that many bugs are correlated with the execution environment, and therefore can be avoided by removing the “allergen” from the environment. Rx requires few to no modifications to applications and provides programmers with additional feedback for bug diagnosis. We have implemented Rx on Linux. Our experiments with five server applications that contain seven bugs of various types show that Rx can survive six out of seven software failures and provide transparent fast recovery within 0.017--0.16 seconds, 21--53 times faster than the whole program restart approach for all but one case (CVS). In contrast, the two tested alternatives, a whole program restart approach and a simple rollback and reexecution without environmental changes, cannot successfully recover the four servers (Squid, Apache, CVS, and ypserv) that contain deterministic bugs, and have only a 40% recovery rate for the server (MySQL) that contains a nondeterministic concurrency bug. Additionally, Rx's checkpointing system is lightweight, imposing small time and space overheads.

许多应用程序都需要可用性。不幸的是，软件故障大大降低了系统的可用性。先前关于幸存软件故障的工作受到以下一个或多个限制的影响:需要的应用程序重构、无法解决确定性的软件错误、对程序执行的不安全推测以及较长的恢复时间。本文提出了一种创新的安全技术，称为Rx，它可以从许多类型的软件错误中快速恢复程序，包括确定性和非确定性。我们的想法是从现实生活中的过敏治疗中得到的启发，即在软件出现故障时将程序回滚到最近的检查点，然后在修改后的环境中重新执行程序。我们基于这样的观察:许多bug与执行环境相关，因此可以通过从环境中移除“过敏原”来避免。Rx几乎不需要对应用程序进行修改，并为程序员提供额外的bug诊断反馈。我们已经在Linux上实现了Rx。我们对包含七种不同类型错误的五个服务器应用程序进行的实验表明，Rx可以在七次软件故障中幸存六次，并在0.017- 0.16秒内提供透明的快速恢复，比整个程序重启方法快21- 53倍，除了一种情况(CVS)。相比之下，测试的两种替代方法，即整个程序重新启动方法和不更改环境的简单回滚和重新执行，不能成功恢复包含确定性错误的四个服务器(Squid、Apache、CVS和ypserv)，并且对于包含非确定性并发错误的服务器(MySQL)只有40%的恢复率。此外，Rx的检查点系统是轻量级的，占用的时间和空间很小。

{"title":"Rx: Treating bugs as allergies—a safe method to survive software failures","authors":"Feng Qin, Joseph A. Tucek, Yuanyuan Zhou, Jagadeesan Sundaresan","doi":"10.1145/1275517.1275519","DOIUrl":"https://doi.org/10.1145/1275517.1275519","url":null,"abstract":"Many applications demand availability. Unfortunately, software failures greatly reduce system availability. Prior work on surviving software failures suffers from one or more of the following limitations: required application restructuring, inability to address deterministic software bugs, unsafe speculation on program execution, and long recovery time.\u0000 This paper proposes an innovative safe technique, called Rx, which can quickly recover programs from many types of software bugs, both deterministic and nondeterministic. Our idea, inspired from allergy treatment in real life, is to rollback the program to a recent checkpoint upon a software failure, and then to reexecute the program in a modified environment. We base this idea on the observation that many bugs are correlated with the execution environment, and therefore can be avoided by removing the “allergen” from the environment. Rx requires few to no modifications to applications and provides programmers with additional feedback for bug diagnosis.\u0000 We have implemented Rx on Linux. Our experiments with five server applications that contain seven bugs of various types show that Rx can survive six out of seven software failures and provide transparent fast recovery within 0.017--0.16 seconds, 21--53 times faster than the whole program restart approach for all but one case (CVS). In contrast, the two tested alternatives, a whole program restart approach and a simple rollback and reexecution without environmental changes, cannot successfully recover the four servers (Squid, Apache, CVS, and ypserv) that contain deterministic bugs, and have only a 40% recovery rate for the server (MySQL) that contains a nondeterministic concurrency bug. Additionally, Rx's checkpointing system is lightweight, imposing small time and space overheads.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"32 1","pages":"7"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82616005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 95

Gossip-based peer sampling 基于八卦的同伴抽样

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2007-08-01 DOI: 10.1145/1275517.1275520

Márk Jelasity, Spyros Voulgaris, R. Guerraoui, Anne-Marie Kermarrec, M. Steen

Gossip-based communication protocols are appealing in large-scale distributed applications such as information dissemination, aggregation, and overlay topology management. This paper factors out a fundamental mechanism at the heart of all these protocols: the peer-sampling service. In short, this service provides every node with peers to gossip with. We promote this service to the level of a first-class abstraction of a large-scale distributed system, similar to a name service being a first-class abstraction of a local-area system. We present a generic framework to implement a peer-sampling service in a decentralized manner by constructing and maintaining dynamic unstructured overlays through gossiping membership information itself. Our framework generalizes existing approaches and makes it easy to discover new ones. We use this framework to empirically explore and compare several implementations of the peer-sampling service. Through extensive simulation experiments we show that---although all protocols provide a good quality uniform random stream of peers to each node locally---traditional theoretical assumptions about the randomness of the unstructured overlays as a whole do not hold in any of the instances. We also show that different design decisions result in severe differences from the point of view of two crucial aspects: load balancing and fault tolerance. Our simulations are validated by means of a wide-area implementation.

基于gossip的通信协议在诸如信息传播、聚合和覆盖拓扑管理等大规模分布式应用程序中具有吸引力。本文提出了所有这些协议核心的一个基本机制:对等抽样服务。简而言之，该服务为每个节点提供了可以闲谈的对等节点。我们将此服务提升到大规模分布式系统的一级抽象级别，类似于将名称服务作为本地系统的一级抽象级别。我们提出了一个通用框架，通过八卦成员信息本身构建和维护动态非结构化覆盖，以分散的方式实现对等抽样服务。我们的框架概括了现有的方法，并使发现新方法变得容易。我们使用这个框架来实证地探索和比较几种对等抽样服务的实现。通过广泛的模拟实验，我们表明，尽管所有协议都为每个节点提供了高质量的均匀随机对等流，但关于非结构化覆盖作为一个整体的随机性的传统理论假设在任何实例中都不成立。我们还表明，从负载平衡和容错这两个关键方面来看，不同的设计决策会导致严重的差异。我们的仿真通过广域实现得到了验证。

{"title":"Gossip-based peer sampling","authors":"Márk Jelasity, Spyros Voulgaris, R. Guerraoui, Anne-Marie Kermarrec, M. Steen","doi":"10.1145/1275517.1275520","DOIUrl":"https://doi.org/10.1145/1275517.1275520","url":null,"abstract":"Gossip-based communication protocols are appealing in large-scale distributed applications such as information dissemination, aggregation, and overlay topology management. This paper factors out a fundamental mechanism at the heart of all these protocols: the peer-sampling service. In short, this service provides every node with peers to gossip with. We promote this service to the level of a first-class abstraction of a large-scale distributed system, similar to a name service being a first-class abstraction of a local-area system. We present a generic framework to implement a peer-sampling service in a decentralized manner by constructing and maintaining dynamic unstructured overlays through gossiping membership information itself. Our framework generalizes existing approaches and makes it easy to discover new ones. We use this framework to empirically explore and compare several implementations of the peer-sampling service. Through extensive simulation experiments we show that---although all protocols provide a good quality uniform random stream of peers to each node locally---traditional theoretical assumptions about the randomness of the unstructured overlays as a whole do not hold in any of the instances. We also show that different design decisions result in severe differences from the point of view of two crucial aspects: load balancing and fault tolerance. Our simulations are validated by means of a wide-area implementation.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"1 1","pages":"8"},"PeriodicalIF":1.5,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85539691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 568