Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles最新文献_第9页

PeerReview: practical accountability for distributed systems 同行评议:分布式系统的实际责任

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294279

Andreas Haeberlen, P. Kuznetsov, P. Druschel

We describe PeerReview, a system that provides accountability in distributed systems. PeerReview ensures that Byzantine faults whose effects are observed by a correct node are eventually detected and irrefutably linked to a faulty node. At the same time, PeerReview ensures that a correct node can always defend itself against false accusations. These guarantees are particularly important for systems that span multiple administrative domains, which may not trust each other.PeerReview works by maintaining a secure record of the messages sent and received by each node. The record isused to automatically detect when a node's behavior deviates from that of a given reference implementation, thus exposing faulty nodes. PeerReview is widely applicable: it only requires that a correct node's actions are deterministic, that nodes can sign messages, and that each node is periodically checked by a correct node. We demonstrate that PeerReview is practical by applying it to three different types of distributed systems: a network filesystem, a peer-to-peer system, and an overlay multicast system.

我们描述了PeerReview，一个在分布式系统中提供问责制的系统。PeerReview确保拜占庭故障的影响被一个正确的节点观察到，最终被检测到，并无可辩驳地链接到一个故障节点。同时，PeerReview确保一个正确的节点总是能够保护自己免受错误的指控。这些保证对于跨越多个管理域的系统尤其重要，因为这些系统可能互不信任。PeerReview的工作原理是维护每个节点发送和接收的消息的安全记录。该记录用于自动检测节点的行为何时偏离给定参考实现的行为，从而暴露错误节点。PeerReview是广泛适用的:它只要求一个正确的节点的动作是确定的，节点可以签署消息，并且每个节点由一个正确的节点定期检查。我们通过将PeerReview应用于三种不同类型的分布式系统来证明它是实用的:网络文件系统、点对点系统和覆盖多播系统。

引用次数: 431

Secure virtual architecture: a safe execution environment for commodity operating systems 安全虚拟体系结构:商用操作系统的安全执行环境

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294295

J. Criswell, Andrew Lenharth, Dinakar Dhurjati, Vikram S. Adve

This paper describes an efficient and robust approach to provide a safe execution environment for an entire operating system, such as Linux, and all its applications. The approach, which we call Secure Virtual Architecture (SVA), defines a virtual, low-level, typed instruction set suitable for executing all code on a system, including kernel and application code. SVA code is translated for execution by a virtual machine transparently, offline or online. SVA aims to enforce fine-grained (object level) memory safety, control-flow integrity, type safety for a subset of objects, and sound analysis. A virtual machine implementing SVA achieves these goals by using a novel approach that exploits properties of existing memory pools in the kernel and by preserving the kernel's explicit control over memory, including custom allocators and explicit deallocation. Furthermore, the safety properties can be encoded compactly as extensions to the SVA type system, allowing the (complex) safety checking compiler to be outside the trusted computing base. SVA also defines a set of OS interface operations that abstract all privileged hardware instructions, allowing the virtual machine to monitor all privileged operations and control the physical resources on a given hardware platform. We have ported the Linux kernel to SVA, treating it as a new architecture, and made only minimal code changes (less than 300 lines of code) to the machine-independent parts of the kernel and device drivers. SVA is able to prevent 4 out of 5 memory safety exploits previously reported for the Linux 2.4.22 kernel for which exploit code is available, and would prevent the fifth one simply by compiling an additional kernel library.

本文描述了一种为整个操作系统(如Linux)及其所有应用程序提供安全执行环境的高效且健壮的方法。这种方法，我们称之为安全虚拟体系结构(SVA)，它定义了一个虚拟的、低级的、类型化的指令集，适用于执行系统上的所有代码，包括内核和应用程序代码。SVA代码被转换为由虚拟机透明地、离线或在线执行。SVA旨在加强细粒度(对象级)内存安全性、控制流完整性、对象子集的类型安全性以及可靠的分析。实现SVA的虚拟机通过使用一种新颖的方法来实现这些目标，这种方法利用内核中现有内存池的属性，并保留内核对内存的显式控制，包括自定义分配器和显式释放。此外，安全属性可以作为SVA类型系统的扩展进行紧凑编码，从而允许(复杂的)安全检查编译器位于可信计算库之外。SVA还定义了一组操作系统接口操作，这些操作抽象了所有特权硬件指令，允许虚拟机监视所有特权操作并控制给定硬件平台上的物理资源。我们已经将Linux内核移植到SVA，将其视为一种新的体系结构，并且只对内核和设备驱动程序中与机器无关的部分进行了最小的代码更改(少于300行代码)。SVA能够阻止先前报告的针对Linux 2.4.22内核的5个内存安全漏洞中的4个，并且可以通过编译额外的内核库来防止第5个漏洞。

{"title":"Secure virtual architecture: a safe execution environment for commodity operating systems","authors":"J. Criswell, Andrew Lenharth, Dinakar Dhurjati, Vikram S. Adve","doi":"10.1145/1294261.1294295","DOIUrl":"https://doi.org/10.1145/1294261.1294295","url":null,"abstract":"This paper describes an efficient and robust approach to provide a safe execution environment for an entire operating system, such as Linux, and all its applications. The approach, which we call Secure Virtual Architecture (SVA), defines a virtual, low-level, typed instruction set suitable for executing all code on a system, including kernel and application code. SVA code is translated for execution by a virtual machine transparently, offline or online. SVA aims to enforce fine-grained (object level) memory safety, control-flow integrity, type safety for a subset of objects, and sound analysis. A virtual machine implementing SVA achieves these goals by using a novel approach that exploits properties of existing memory pools in the kernel and by preserving the kernel's explicit control over memory, including custom allocators and explicit deallocation. Furthermore, the safety properties can be encoded compactly as extensions to the SVA type system, allowing the (complex) safety checking compiler to be outside the trusted computing base. SVA also defines a set of OS interface operations that abstract all privileged hardware instructions, allowing the virtual machine to monitor all privileged operations and control the physical resources on a given hardware platform. We have ported the Linux kernel to SVA, treating it as a new architecture, and made only minimal code changes (less than 300 lines of code) to the machine-independent parts of the kernel and device drivers. SVA is able to prevent 4 out of 5 memory safety exploits previously reported for the Linux 2.4.22 kernel for which exploit code is available, and would prevent the fifth one simply by compiling an additional kernel library.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"4 1","pages":"351-366"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89820923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 159

VirtualPower: coordinated power management in virtualized enterprise systems VirtualPower:虚拟化企业系统中的协同电源管理

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294287

Ripal Nathuji, K. Schwan

Power management has become increasingly necessary in large-scale datacenters to address costs and limitations in cooling or power delivery. This paper explores how to integrate power management mechanisms and policies with the virtualization technologies being actively deployed in these environments. The goals of the proposed VirtualPower approach to online power management are (i) to support the isolated and independent operation assumed by guest virtual machines (VMs) running on virtualized platforms and (ii) to make it possible to control and globally coordinate the effects of the diverse power management policies applied by these VMs to virtualized resources. To attain these goals, VirtualPower extends to guest VMs `soft' versions of the hardware power states for which their policies are designed. The resulting technical challenge is to appropriately map VM-level updates made to soft power states to actual changes in the states or in the allocation of underlying virtualized hardware. An implementation of VirtualPower Management (VPM) for the Xen hypervisor addresses this challenge by provision of multiple system-level abstractions including VPM states, channels, mechanisms, and rules. Experimental evaluations on modern multicore platforms highlight resulting improvements in online power management capabilities, including minimization of power consumption with little or no performance penalties and the ability to throttle power consumption while still meeting application requirements. Finally, coordination of online methods for server consolidation with VPM management techniques in heterogeneous server systems is shown to provide up to 34% improvements in power consumption.

在大型数据中心中，电源管理变得越来越必要，以解决冷却或供电方面的成本和限制。本文探讨了如何将电源管理机制和策略与这些环境中正在积极部署的虚拟化技术集成在一起。提出的VirtualPower在线电源管理方法的目标是:(i)支持运行在虚拟化平台上的来宾虚拟机(vm)所假定的隔离和独立操作;(ii)使这些虚拟机对虚拟化资源应用的各种电源管理策略的影响能够得到控制和全局协调。为了实现这些目标，VirtualPower扩展到客户虚拟机的硬件电源状态的“软”版本，它们的策略是针对这些状态设计的。由此产生的技术挑战是将对软实力状态所做的vm级更新适当地映射到状态或底层虚拟化硬件分配中的实际更改。Xen管理程序的虚拟电源管理(VirtualPower Management, VPM)实现通过提供多个系统级抽象(包括VPM状态、通道、机制和规则)来解决这一挑战。在现代多核平台上的实验评估强调了在线电源管理能力的改进，包括在很少或没有性能损失的情况下最小化功耗，以及在满足应用需求的同时限制功耗的能力。最后，在异构服务器系统中，将服务器整合的在线方法与VPM管理技术相协调，可以将功耗提高34%。

{"title":"VirtualPower: coordinated power management in virtualized enterprise systems","authors":"Ripal Nathuji, K. Schwan","doi":"10.1145/1294261.1294287","DOIUrl":"https://doi.org/10.1145/1294261.1294287","url":null,"abstract":"Power management has become increasingly necessary in large-scale datacenters to address costs and limitations in cooling or power delivery. This paper explores how to integrate power management mechanisms and policies with the virtualization technologies being actively deployed in these environments. The goals of the proposed VirtualPower approach to online power management are (i) to support the isolated and independent operation assumed by guest virtual machines (VMs) running on virtualized platforms and (ii) to make it possible to control and globally coordinate the effects of the diverse power management policies applied by these VMs to virtualized resources. To attain these goals, VirtualPower extends to guest VMs `soft' versions of the hardware power states for which their policies are designed. The resulting technical challenge is to appropriately map VM-level updates made to soft power states to actual changes in the states or in the allocation of underlying virtualized hardware. An implementation of VirtualPower Management (VPM) for the Xen hypervisor addresses this challenge by provision of multiple system-level abstractions including VPM states, channels, mechanisms, and rules. Experimental evaluations on modern multicore platforms highlight resulting improvements in online power management capabilities, including minimization of power consumption with little or no performance penalties and the ability to throttle power consumption while still meeting application requirements. Finally, coordination of online methods for server consolidation with VPM management techniques in heterogeneous server systems is shown to provide up to 34% improvements in power consumption.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"25 1","pages":"265-278"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88823142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 763

Generalized file system dependencies 通用文件系统依赖关系

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294291

Christopher Frost, Mike Mammarella, E. Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, Lei Zhang

Reliable storage systems depend in part on "write-before" relationships where some changes to stable storage are delayed until other changes commit. A journaled file system, for example, must commit a journal transaction before applying that transaction's changes, and soft updates and other consistency enforcement mechanisms have similar constraints, implemented in each case in system-dependent ways. We present a general abstraction, the patch, that makes write-before relationships explicit and file system agnostic. A patch-based file system implementation expresses dependencies among writes, leaving lower system layers to determine write orders that satisfy those dependencies. Storage system modules can examine and modify the dependency structure, and generalized file system dependencies are naturally exportable to user level. Our patch-based storage system, Feather stitch, includes several important optimizations that reduce patch overheads by orders of magnitude. Our ext2 prototype runs in the Linux kernel and supports a synchronous writes, soft updates-like dependencies, and journaling. It outperforms similarly reliable ext2 and ext3 configurations on some, but not all, benchmarks. It also supports unusual configurations, such as correct dependency enforcement within a loopback file system, and lets applications define consistency requirements without micromanaging how those requirements are satisfied.

可靠的存储系统部分依赖于“write-before”的关系，在这种关系中，对稳定存储的一些更改会延迟到其他更改提交之前。例如，日志文件系统必须在应用日志事务的更改之前提交日志事务，软更新和其他一致性强制机制也有类似的约束，在每种情况下都以依赖于系统的方式实现。我们提出了一个通用的抽象，即补丁，它使write-before关系显式，并且与文件系统无关。基于补丁的文件系统实现表示写之间的依赖关系，让较低的系统层决定满足这些依赖关系的写顺序。存储系统模块可以检查和修改依赖关系结构，并且一般化的文件系统依赖关系可以自然地导出到用户级别。我们的基于补丁的存储系统，羽毛针，包括几个重要的优化，减少补丁开销的数量级。我们的ext2原型运行在Linux内核中，支持同步写入、类似软更新的依赖关系和日志记录。在一些(但不是全部)基准测试中，它的性能优于同样可靠的ext2和ext3配置。它还支持不寻常的配置，例如在环回文件系统中执行正确的依赖关系，并允许应用程序定义一致性需求，而无需微观管理如何满足这些需求。

{"title":"Generalized file system dependencies","authors":"Christopher Frost, Mike Mammarella, E. Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, Lei Zhang","doi":"10.1145/1294261.1294291","DOIUrl":"https://doi.org/10.1145/1294261.1294291","url":null,"abstract":"Reliable storage systems depend in part on \"write-before\" relationships where some changes to stable storage are delayed until other changes commit. A journaled file system, for example, must commit a journal transaction before applying that transaction's changes, and soft updates and other consistency enforcement mechanisms have similar constraints, implemented in each case in system-dependent ways. We present a general abstraction, the patch, that makes write-before relationships explicit and file system agnostic. A patch-based file system implementation expresses dependencies among writes, leaving lower system layers to determine write orders that satisfy those dependencies. Storage system modules can examine and modify the dependency structure, and generalized file system dependencies are naturally exportable to user level. Our patch-based storage system, Feather stitch, includes several important optimizations that reduce patch overheads by orders of magnitude. Our ext2 prototype runs in the Linux kernel and supports a synchronous writes, soft updates-like dependencies, and journaling. It outperforms similarly reliable ext2 and ext3 configurations on some, but not all, benchmarks. It also supports unusual configurations, such as correct dependency enforcement within a loopback file system, and lets applications define consistency requirements without micromanaging how those requirements are satisfied.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"26 1","pages":"307-320"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82090704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Triage: diagnosing production run failures at the user's site 分类:诊断用户站点上的生产运行故障

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2007-10-14 DOI: 10.1145/1294261.1294275

Joseph A. Tucek, Shan Lu, Chengdu Huang, S. Xanthos, Yuanyuan Zhou

Diagnosing production run failures is a challenging yet importanttask. Most previous work focuses on offsite diagnosis, i.e.development site diagnosis with the programmers present. This is insufficient for production-run failures as: (1) it is difficult to reproduce failures offsite for diagnosis; (2) offsite diagnosis cannot provide timely guidance for recovery or security purposes; (3)it is infeasible to provide a programmer to diagnose every production run failure; and (4) privacy concerns limit the release of information(e.g. coredumps) to programmers. To address production-run failures, we propose a system, called Triage, that automatically performs onsite software failure diagnosis at the very moment of failure. It provides a detailed diagnosis report, including the failure nature, triggering conditions, related code and variables, the fault propagation chain, and potential fixes. Triage achieves this by leveraging lightweight reexecution support to efficiently capture the failure environment and repeatedly replay the moment of failure, and dynamically--using different diagnosis techniques--analyze an occurring failure. Triage employs afailure diagnosis protocol that mimics the steps a human takes in debugging. This extensible protocol provides a framework to enable the use of various existing and new diagnosis techniques. We also propose a new failure diagnosis technique, delta analysis, to identify failure related conditions, code, and variables. We evaluate these ideas in real system experiments with 10 real software failures from 9 open source applications including four servers. Triage accurately diagnoses the evaluated failures, providing likely root causes and even the fault propagation chain, while keeping normal-run overhead to under 5%. Finally, our user study of the diagnosis and repair of real bugs shows that Triagesaves time (99.99% confidence), reducing the total time to fix by almost half.

诊断生产运行故障是一项具有挑战性但又很重要的任务。大多数以前的工作集中于非现场诊断，即与在场的程序员进行开发现场诊断。这对于生产运行故障是不够的，因为:(1)很难在场外重现故障以进行诊断;(2)非现场诊断不能为恢复或安全提供及时指导的;(3)提供一个程序员来诊断每一个生产运行故障是不可行的;(4)隐私问题限制了信息的发布(例如:核心转储)给程序员。为了解决生产运行故障，我们提出了一个称为Triage的系统，它可以在故障发生的那一刻自动执行现场软件故障诊断。它提供了详细的诊断报告，包括故障性质、触发条件、相关代码和变量、故障传播链和潜在的修复。Triage通过利用轻量级重执行支持来有效地捕获故障环境，反复重放故障时刻，并动态地(使用不同的诊断技术)分析发生的故障，从而实现了这一点。Triage使用故障诊断协议，该协议模仿人类在调试过程中采取的步骤。这个可扩展协议提供了一个框架，使各种现有的和新的诊断技术的使用成为可能。我们还提出了一种新的故障诊断技术，delta分析，以识别故障相关的条件，代码和变量。我们在真实的系统实验中评估了这些想法，其中包括9个开源应用程序(包括4个服务器)的10个真实软件故障。Triage准确地诊断评估的故障，提供可能的根本原因，甚至故障传播链，同时将正常运行开销保持在5%以下。最后，我们对真实bug的诊断和修复的用户研究表明，triages节省了时间(99.99%的置信度)，将修复的总时间减少了近一半。

{"title":"Triage: diagnosing production run failures at the user's site","authors":"Joseph A. Tucek, Shan Lu, Chengdu Huang, S. Xanthos, Yuanyuan Zhou","doi":"10.1145/1294261.1294275","DOIUrl":"https://doi.org/10.1145/1294261.1294275","url":null,"abstract":"Diagnosing production run failures is a challenging yet importanttask. Most previous work focuses on offsite diagnosis, i.e.development site diagnosis with the programmers present. This is insufficient for production-run failures as: (1) it is difficult to reproduce failures offsite for diagnosis; (2) offsite diagnosis cannot provide timely guidance for recovery or security purposes; (3)it is infeasible to provide a programmer to diagnose every production run failure; and (4) privacy concerns limit the release of information(e.g. coredumps) to programmers.\u0000 To address production-run failures, we propose a system, called Triage, that automatically performs onsite software failure diagnosis at the very moment of failure. It provides a detailed diagnosis report, including the failure nature, triggering conditions, related code and variables, the fault propagation chain, and potential fixes. Triage achieves this by leveraging lightweight reexecution support to efficiently capture the failure environment and repeatedly replay the moment of failure, and dynamically--using different diagnosis techniques--analyze an occurring failure. Triage employs afailure diagnosis protocol that mimics the steps a human takes in debugging. This extensible protocol provides a framework to enable the use of various existing and new diagnosis techniques. We also propose a new failure diagnosis technique, delta analysis, to identify failure related conditions, code, and variables.\u0000 We evaluate these ideas in real system experiments with 10 real software failures from 9 open source applications including four servers. Triage accurately diagnoses the evaluated failures, providing likely root causes and even the fault propagation chain, while keeping normal-run overhead to under 5%. Finally, our user study of the diagnosis and repair of real bugs shows that Triagesaves time (99.99% confidence), reducing the total time to fix by almost half.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"102 1","pages":"131-144"},"PeriodicalIF":0.0,"publicationDate":"2007-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80585899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 157

Session details: Storage 会话详细信息:

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2007-06-12 DOI: 10.1145/3262308

John R. Douceur

引用次数: 0

RaceTrack: efficient detection of data race conditions via adaptive tracking RaceTrack:通过自适应跟踪有效检测数据竞赛条件

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2005-10-23 DOI: 10.1145/1095810.1095832

Yuan Yu, T. Rodeheffer, Wei Chen

Bugs due to data races in multithreaded programs often exhibit non-deterministic symptoms and are notoriously difficult to find. This paper describes RaceTrack, a dynamic race detection tool that tracks the actions of a program and reports a warning whenever a suspicious pattern of activity has been observed. RaceTrack uses a novel hybrid detection algorithm and employs an adaptive approach that automatically directs more effort to areas that are more suspicious, thus providing more accurate warnings for much less over-head. A post-processing step correlates warnings and ranks code segments based on how strongly they are implicated in potential data races. We implemented RaceTrack inside the virtual machine of Microsoft's Common Language Runtime (product version v1.1.4322) and monitored several major, real-world applications directly out-of-the-box,without any modification. Adaptive tracking resulted in a slowdown ratio of about 3x on memory-intensive programs and typically much less than 2x on other programs,and a memory ratio of typically less than 1.2x. Several serious data race bugs were revealed, some previously unknown.

多线程程序中由于数据竞争而导致的bug通常表现出不确定性症状，并且非常难以发现。本文描述了RaceTrack，一个动态的比赛检测工具，它可以跟踪程序的动作，并在观察到可疑的活动模式时报告警告。RaceTrack采用一种新颖的混合检测算法，并采用自适应方法，自动将更多的精力引导到更可疑的区域，从而以更少的开销提供更准确的警告。后处理步骤将警告关联起来，并根据代码段在潜在数据竞争中的牵连程度对它们进行排序。我们在Microsoft的公共语言运行时(产品版本v1.1.4322)的虚拟机中实现了RaceTrack，并直接监控了几个主要的、现实世界中的应用程序，而无需进行任何修改。自适应跟踪导致内存密集型程序的减速率约为3倍，而其他程序的减速率通常远低于2倍，内存比率通常低于1.2倍。揭示了几个严重的数据竞争错误，其中一些以前不为人知。

{"title":"RaceTrack: efficient detection of data race conditions via adaptive tracking","authors":"Yuan Yu, T. Rodeheffer, Wei Chen","doi":"10.1145/1095810.1095832","DOIUrl":"https://doi.org/10.1145/1095810.1095832","url":null,"abstract":"Bugs due to data races in multithreaded programs often exhibit non-deterministic symptoms and are notoriously difficult to find. This paper describes RaceTrack, a dynamic race detection tool that tracks the actions of a program and reports a warning whenever a suspicious pattern of activity has been observed. RaceTrack uses a novel hybrid detection algorithm and employs an adaptive approach that automatically directs more effort to areas that are more suspicious, thus providing more accurate warnings for much less over-head. A post-processing step correlates warnings and ranks code segments based on how strongly they are implicated in potential data races. We implemented RaceTrack inside the virtual machine of Microsoft's Common Language Runtime (product version v1.1.4322) and monitored several major, real-world applications directly out-of-the-box,without any modification. Adaptive tracking resulted in a slowdown ratio of about 3x on memory-intensive programs and typically much less than 2x on other programs,and a memory ratio of typically less than 1.2x. Several serious data race bugs were revealed, some previously unknown.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"26 1","pages":"221-234"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83614822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 414

The case for judicious resource management 明智的资源管理

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2005-10-23 DOI: 10.1145/1095810.1118579

C. Poellabauer, Timothy Durnan

Consider the following scenario taken from the mobile and wireless computing domain. Energy has been receiving increasing attention, resulting in a number of different energy management techniques, including Dynamic Voltage Scaling (DVS) [1]. DVS is based on the concept of reducing the speed/voltage of a CPU when it is under-utilized, thereby reducing its power consumption while increasing the task execution times. In real-time systems, DVS algorithms have to compute energy-saving speed/voltage levels while ensuring that task deadlines are met. The figure below visualizes this problem for two devices A and B, where shaded areas indicate times of power consumption caused by the CPU and arrows indicate communication between two devices. The vertical line indicates the end-to-end deadline Td, i.e., the processing and communication steps of both devices A and B have to be concluded before Td. Typical examples for such scenarios are sensor networks with in-network data aggregation or mobile multimedia. For example, device A captures an image, compresses it, and sends it to B, which decompresses and displays it. The figure shows the same scenario twice, once without DVS and once with DVS. In the latter case, both devices reduce their energy overheads, but device B also misses its deadline. As a consequence, either one or both devices have to increase their clock frequencies to ensure that the deadline is met, increasing their energy costs. However, if both devices operate in isolation, A -- unaware of the missed end-to-end deadline -- would continue to operate at its low speed, while B has to increase its speed. Now assume that B is essential to the operation of the distributed system, but at the same time it is also the more energy-constrained device (e.g., the remaining battery lifetime is lower than A's). In this case, it is desirable that A reduces its use of DVS, such that B can continue to fully exploit its DVS capability to prolong its battery life. To achieve that, it is necessary for A and B to negotiate limits to the use of DVS, e.g., by introducing a deadline on A, called virtual deadline Tv (rightmost graph in above figure). This deadline forces A to run faster (limiting the extent to which A can exploit DVS), but allowing B to fully utilize DVS.

考虑以下取自移动和无线计算领域的场景。能源受到越来越多的关注，导致了许多不同的能源管理技术，包括动态电压缩放(DVS)[1]。分布式交换机的概念是在CPU未被充分利用的情况下，降低CPU的速度/电压，从而降低CPU的功耗，增加任务的执行时间。在实时系统中，分布式交换机算法必须计算节能速度/电压水平，同时确保满足任务期限。下图显示了两个设备A和B的这个问题，其中阴影区域表示CPU造成的功耗时间，箭头表示两个设备之间的通信。竖线表示端到端的截止时间Td，即设备A和设备B的处理和通信步骤都必须在Td之前完成。此类场景的典型示例是具有网络内数据聚合或移动多媒体的传感器网络。例如，设备A捕获图像，将其压缩，并将其发送给B, B将其解压缩并显示。相同的场景，图中显示了两次，一次是不使用分布式交换机，一次是使用分布式交换机。在后一种情况下，两个设备都减少了它们的能源开销，但设备B也错过了最后期限。因此，其中一个或两个设备必须提高其时钟频率以确保满足最后期限，从而增加其能源成本。但是，如果两个设备隔离运行，则不知道错过的端到端截止日期的A将继续以低速运行，而B必须提高其速度。现在假设B对分布式系统的运行至关重要，但同时它也是更受能量限制的设备(例如，剩余电池寿命低于A)。在这种情况下，希望A减少分布式交换机的使用，这样B就可以继续充分利用其分布式交换机的能力来延长电池寿命。为了实现这一点，A和B有必要协商对DVS使用的限制，例如，通过在A上引入一个截止日期，称为虚拟截止日期Tv(上图中最右边的图表)。这个截止日期迫使A运行得更快(限制了A可以利用DVS的程度)，但允许B充分利用DVS。

{"title":"The case for judicious resource management","authors":"C. Poellabauer, Timothy Durnan","doi":"10.1145/1095810.1118579","DOIUrl":"https://doi.org/10.1145/1095810.1118579","url":null,"abstract":"Consider the following scenario taken from the mobile and wireless computing domain. Energy has been receiving increasing attention, resulting in a number of different energy management techniques, including Dynamic Voltage Scaling (DVS) [1]. DVS is based on the concept of reducing the speed/voltage of a CPU when it is under-utilized, thereby reducing its power consumption while increasing the task execution times. In real-time systems, DVS algorithms have to compute energy-saving speed/voltage levels while ensuring that task deadlines are met. The figure below visualizes this problem for two devices A and B, where shaded areas indicate times of power consumption caused by the CPU and arrows indicate communication between two devices. The vertical line indicates the end-to-end deadline Td, i.e., the processing and communication steps of both devices A and B have to be concluded before Td. Typical examples for such scenarios are sensor networks with in-network data aggregation or mobile multimedia. For example, device A captures an image, compresses it, and sends it to B, which decompresses and displays it. The figure shows the same scenario twice, once without DVS and once with DVS. In the latter case, both devices reduce their energy overheads, but device B also misses its deadline. As a consequence, either one or both devices have to increase their clock frequencies to ensure that the deadline is met, increasing their energy costs. However, if both devices operate in isolation, A -- unaware of the missed end-to-end deadline -- would continue to operate at its low speed, while B has to increase its speed. Now assume that B is essential to the operation of the distributed system, but at the same time it is also the more energy-constrained device (e.g., the remaining battery lifetime is lower than A's). In this case, it is desirable that A reduces its use of DVS, such that B can continue to fully exploit its DVS capability to prolong its battery life. To achieve that, it is necessary for A and B to negotiate limits to the use of DVS, e.g., by introducing a deadline on A, called virtual deadline Tv (rightmost graph in above figure). This deadline forces A to run faster (limiting the extent to which A can exploit DVS), but allowing B to fully utilize DVS.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"1 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91397165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BAR fault tolerance for cooperative services 协作服务的BAR容错

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2005-10-23 DOI: 10.1145/1095810.1095816

Amitanand S. Aiyer, L. Alvisi, Allen Clement, M. Dahlin, Jean-Philippe Martin, C. Porth

This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both Byzantine behaviors when broken, misconfigured, or malicious nodes arbitrarily deviate from their specification and rational behaviors when selfish nodes deviate from their specification to increase their local benefit. The paper makes three contributions: (1) It introduces the BAR (Byzantine, Altruistic, Rational) model as a foundation for reasoning about cooperative services; (2) It proposes a general three-level architecture to reduce the complexity of building services under the BAR model; and (3) It describes an implementation of BAR-B the first cooperative backup service to tolerate both Byzantine users and an unbounded number of rational users. At the core of BAR-B is an asynchronous replicated state machine that provides the customary safety and liveness guarantees despite nodes exhibiting both Byzantine and rational behaviors. Our prototype provides acceptable performance for our application: our BAR-tolerant state machine executes 15 requests per second, and our BAR-B backup service can back up 100MB of data in under 4 minutes.

本文描述了一种构造跨多个管理域的协作服务的通用方法。在这样的环境中，协议必须容忍当故障、配置错误或恶意节点任意偏离其规范时的拜占庭行为，以及当自私节点偏离其规范以增加其局部利益时的理性行为。本文主要有三个贡献:(1)引入了BAR (Byzantine, Altruistic, Rational)模型作为合作服务推理的基础;(2)提出了一个通用的三层架构，以降低BAR模式下建筑服务的复杂性;(3)描述了BAR-B的实现，这是第一个同时允许拜占庭用户和无限数量的理性用户的协作备份服务。BAR-B的核心是一个异步复制状态机，尽管节点既表现出拜占庭式的行为，也表现出理性的行为，但它提供了常规的安全性和活动性保证。我们的原型为我们的应用程序提供了可接受的性能:我们的bar容忍状态机每秒执行15个请求，我们的BAR-B备份服务可以在4分钟内备份100MB的数据。

{"title":"BAR fault tolerance for cooperative services","authors":"Amitanand S. Aiyer, L. Alvisi, Allen Clement, M. Dahlin, Jean-Philippe Martin, C. Porth","doi":"10.1145/1095810.1095816","DOIUrl":"https://doi.org/10.1145/1095810.1095816","url":null,"abstract":"This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such environments, protocols must tolerate both Byzantine behaviors when broken, misconfigured, or malicious nodes arbitrarily deviate from their specification and rational behaviors when selfish nodes deviate from their specification to increase their local benefit. The paper makes three contributions: (1) It introduces the BAR (Byzantine, Altruistic, Rational) model as a foundation for reasoning about cooperative services; (2) It proposes a general three-level architecture to reduce the complexity of building services under the BAR model; and (3) It describes an implementation of BAR-B the first cooperative backup service to tolerate both Byzantine users and an unbounded number of rational users. At the core of BAR-B is an asynchronous replicated state machine that provides the customary safety and liveness guarantees despite nodes exhibiting both Byzantine and rational behaviors. Our prototype provides acceptable performance for our application: our BAR-tolerant state machine executes 15 requests per second, and our BAR-B backup service can back up 100MB of data in under 4 minutes.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"86 1","pages":"45-58"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74271742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 348

FS2: dynamic data replication in free disk space for improving disk performance and energy consumption FS2:利用空闲磁盘空间进行动态数据复制，提高磁盘性能和能耗

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2005-10-23 DOI: 10.1145/1095810.1095836

Hai Huang, Wanda Hung, K. Shin

Disk performance is increasingly limited by its head positioning latencies, i.e., seek time and rotational delay. To reduce the head positioning latencies, we propose a novel technique that dynamically places copies of data in file system's free blocks according to the disk access patterns observed at runtime. As one or more replicas can now be accessed in addition to their original data block, choosing the "nearest" replica that provides fastest access can significantly improve performance for disk I/O operations.We implemented and evaluated a prototype based on the popular Ext2 file system. In our prototype, since the file system layout is modified only by using the free/unused disk space (hence the name Free Space File System, or FS2), users are completely oblivious to how the file system layout is modified in the background; they will only notice performance improvements over time. For a wide range of workloads running under Linux, FS2 is shown to reduce disk access time by 41--68% (as a result of a 37--78% shorter seek time and a 31--68% shorter rotational delay) making a 16--34% overall user-perceived performance improvement. The reduced disk access time also leads to a 40--71% energy savings per access.

磁盘性能越来越受到磁头定位延迟的限制，即寻道时间和旋转延迟。为了减少磁头定位延迟，我们提出了一种新技术，根据在运行时观察到的磁盘访问模式，动态地将数据副本放置在文件系统的空闲块中。由于现在除了原始数据块之外还可以访问一个或多个副本，因此选择提供最快访问的“最近”副本可以显著提高磁盘I/O操作的性能。我们基于流行的Ext2文件系统实现并评估了一个原型。在我们的原型中，由于文件系统布局仅通过使用空闲/未使用的磁盘空间来修改(因此称为free space file system，或FS2)，用户完全不知道文件系统布局是如何在后台被修改的;随着时间的推移，他们只会注意到性能的提高。对于在Linux下运行的各种工作负载，FS2显示可以将磁盘访问时间减少41- 68%(由于寻道时间缩短了37- 78%，旋转延迟缩短了31- 68%)，从而使总体用户感知的性能提高了16- 34%。减少的磁盘访问时间还导致每次访问节省40- 71%的能源。

{"title":"FS2: dynamic data replication in free disk space for improving disk performance and energy consumption","authors":"Hai Huang, Wanda Hung, K. Shin","doi":"10.1145/1095810.1095836","DOIUrl":"https://doi.org/10.1145/1095810.1095836","url":null,"abstract":"Disk performance is increasingly limited by its head positioning latencies, i.e., seek time and rotational delay. To reduce the head positioning latencies, we propose a novel technique that dynamically places copies of data in file system's free blocks according to the disk access patterns observed at runtime. As one or more replicas can now be accessed in addition to their original data block, choosing the \"nearest\" replica that provides fastest access can significantly improve performance for disk I/O operations.We implemented and evaluated a prototype based on the popular Ext2 file system. In our prototype, since the file system layout is modified only by using the free/unused disk space (hence the name Free Space File System, or FS2), users are completely oblivious to how the file system layout is modified in the background; they will only notice performance improvements over time. For a wide range of workloads running under Linux, FS2 is shown to reduce disk access time by 41--68% (as a result of a 37--78% shorter seek time and a 31--68% shorter rotational delay) making a 16--34% overall user-perceived performance improvement. The reduced disk access time also leads to a 40--71% energy savings per access.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"187 1","pages":"263-276"},"PeriodicalIF":0.0,"publicationDate":"2005-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90653158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 200