Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles最新文献

英文中文

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629591

Donald E. Porter, O. S. Hofmann, C. Rossbach, Alexander Benn, E. Witchel

Applications must be able to synchronize accesses to operating system resources in order to ensure correctness in the face of concurrency and system failures. System transactions allow the programmer to specify updates to heterogeneous system resources with the OS guaranteeing atomicity, consistency, isolation, and durability (ACID). System transactions efficiently and cleanly solve persistent concurrency problems that are difficult to address with other techniques. For example, system transactions eliminate security vulnerabilities in the file system that are caused by time-of-check-to-time-of-use (TOCTTOU) race conditions. System transactions enable an unsuccessful software installation to roll back without disturbing concurrent, independent updates to the file system. This paper describes TxOS, a variant of Linux 2.6.22 that implements system transactions. TxOS uses new implementation techniques to provide fast, serializable transactions with strong isolation and fairness between system transactions and non-transactional activity. The prototype demonstrates that a mature OS running on commodity hardware can provide system transactions at a reasonable performance cost. For instance, a transactional installation of OpenSSH incurs only 10% overhead, and a non-transactional compilation of Linux incurs negligible overhead on TxOS. By making transactions a central OS abstraction, TxOS enables new transactional services. For example, one developer prototyped a transactional ext3 file system in less than one month.

应用程序必须能够同步对操作系统资源的访问，以确保在并发性和系统故障时的正确性。系统事务允许程序员指定对异构系统资源的更新，操作系统保证原子性、一致性、隔离性和持久性(ACID)。系统事务高效而清晰地解决了其他技术难以解决的持久并发问题。例如，系统事务消除了文件系统中由检查时间到使用时间(TOCTTOU)争用条件引起的安全漏洞。系统事务使不成功的软件安装能够回滚，而不会干扰对文件系统的并发、独立更新。本文介绍了TxOS，它是Linux 2.6.22的一个变体，用于实现系统事务。TxOS使用新的实现技术来提供快速、可序列化的事务，并在系统事务和非事务活动之间具有很强的隔离性和公平性。该原型表明，运行在商用硬件上的成熟操作系统可以以合理的性能成本提供系统事务。例如，OpenSSH的事务性安装只会产生10%的开销，而Linux的非事务性编译在TxOS上产生的开销可以忽略不计。通过使事务成为一个中心操作系统抽象，TxOS支持新的事务服务。例如，一个开发人员在不到一个月的时间内完成了一个事务性ext3文件系统的原型。

{"title":"Operating System Transactions","authors":"Donald E. Porter, O. S. Hofmann, C. Rossbach, Alexander Benn, E. Witchel","doi":"10.1145/1629575.1629591","DOIUrl":"https://doi.org/10.1145/1629575.1629591","url":null,"abstract":"Applications must be able to synchronize accesses to operating system resources in order to ensure correctness in the face of concurrency and system failures. System transactions allow the programmer to specify updates to heterogeneous system resources with the OS guaranteeing atomicity, consistency, isolation, and durability (ACID). System transactions efficiently and cleanly solve persistent concurrency problems that are difficult to address with other techniques. For example, system transactions eliminate security vulnerabilities in the file system that are caused by time-of-check-to-time-of-use (TOCTTOU) race conditions. System transactions enable an unsuccessful software installation to roll back without disturbing concurrent, independent updates to the file system.\u0000 This paper describes TxOS, a variant of Linux 2.6.22 that implements system transactions. TxOS uses new implementation techniques to provide fast, serializable transactions with strong isolation and fairness between system transactions and non-transactional activity. The prototype demonstrates that a mature OS running on commodity hardware can provide system transactions at a reasonable performance cost. For instance, a transactional installation of OpenSSH incurs only 10% overhead, and a non-transactional compilation of Linux incurs negligible overhead on TxOS. By making transactions a central OS abstraction, TxOS enables new transactional services. For example, one developer prototyped a transactional ext3 file system in less than one month.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"35 1","pages":"161-176"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76038446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 110

Distributed aggregation for data-parallel computing: interfaces and implementations 用于数据并行计算的分布式聚合:接口和实现

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629600

Yuan Yu, P. Gunda, M. Isard

Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require non-standard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest. This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle Parallel Server, and DryadLINQ. We show that: the degree of language integration between user-defined functions and the high-level query language has an impact on code legibility and simplicity; the choice of programming interface has a material effect on the performance of computations; some execution plans perform better than others on average; and that in order to get good performance on a variety of workloads a system must be able to select between execution plans depending on the computation. The interface and execution plan described in the MapReduce paper, and implemented by Hadoop, are found to be among the worst-performing choices.

数据密集型应用程序越来越多地被设计为在大型计算集群上执行。分组聚合是许多分布式编程模型的核心原语，它通常是矩阵乘法和图遍历等计算的最有效的可用机制。这种算法通常需要非标准的聚合，这些聚合比传统的内置数据库函数(如Sum和Max)更复杂。因此，编程用户定义聚合的便利性及其实现的效率是当前的一大关注点。本文评估了几个最先进的分布式计算系统中用户定义聚合的接口和实现:Hadoop、Oracle Parallel Server等数据库和DryadLINQ。我们表明:用户定义函数和高级查询语言之间的语言集成程度对代码的易读性和简单性有影响;编程接口的选择对计算性能有重要影响;有些执行计划比其他执行计划平均执行得更好;为了在各种工作负载上获得良好的性能，系统必须能够根据计算选择不同的执行计划。MapReduce论文中描述的由Hadoop实现的接口和执行计划被认为是性能最差的选择之一。

{"title":"Distributed aggregation for data-parallel computing: interfaces and implementations","authors":"Yuan Yu, P. Gunda, M. Isard","doi":"10.1145/1629575.1629600","DOIUrl":"https://doi.org/10.1145/1629575.1629600","url":null,"abstract":"Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require non-standard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest.\u0000 This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle Parallel Server, and DryadLINQ. We show that: the degree of language integration between user-defined functions and the high-level query language has an impact on code legibility and simplicity; the choice of programming interface has a material effect on the performance of computations; some execution plans perform better than others on average; and that in order to get good performance on a variety of workloads a system must be able to select between execution plans depending on the computation. The interface and execution plan described in the MapReduce paper, and implemented by Hadoop, are found to be among the worst-performing choices.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"454 1","pages":"247-260"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79733790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 197

Upright cluster services 直立集群服务

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629602

Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, L. Alvisi, M. Dahlin, Taylor L. Riché

The UpRight library seeks to make Byzantine fault tolerance (BFT) a simple and viable alternative to crash fault tolerance for a range of cluster services. We demonstrate UpRight by producing BFT versions of the Zookeeper lock service and the Hadoop Distributed File System (HDFS). Our design choices in UpRight favor simplifying adoption by existing applications; performance is a secondary concern. Despite these priorities, our BFT Zookeeper and BFT HDFS implementations have performance comparable with the originals while providing additional robustness.

直立库试图使拜占庭容错(BFT)成为一系列集群服务的崩溃容错的简单可行的替代方案。我们通过生成Zookeeper锁服务和Hadoop分布式文件系统(HDFS)的BFT版本来演示直立。我们的设计选择在直立有利于简化采用现有的应用程序;性能是次要的问题。尽管有这些优先级，我们的BFT Zookeeper和BFT HDFS实现的性能与原始版本相当，同时提供了额外的鲁棒性。

引用次数: 258

Tolerating hardware device failures in software 在软件中容忍硬件设备故障

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629582

Asim Kadav, Matthew J. Renzelmann, M. Swift

Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While major operating system and device vendors recommend that drivers detect and recover from hardware failures, we find that there are many drivers that will crash or hang when a device fails. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load. This paper describes Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement. To facilitate proactive management of device failures, Carburizer can also locate existing driver code that detects device failures and inserts missing failure-reporting code. Finally, the Carburizer runtime can detect and tolerate interrupt-related bugs, such as stuck or missing interrupts.

硬件设备可能会出现故障，但许多驱动程序认为它们不会。当面对行为不正常的真实设备时，这些假设可能导致驱动程序或系统故障。虽然主要的操作系统和设备供应商建议驱动程序检测并从硬件故障中恢复，但我们发现，当设备故障时，有许多驱动程序会崩溃或挂起。常规的压力测试很难检测到这些漏洞，因为故障是由设备引起的，而不是由软件负载引起的。本文介绍了Carburizer，一个代码操作工具和相关的运行时，以提高系统的可靠性，在存在故障的设备。Carburizer分析驱动程序源代码，以找到驱动程序不正确地信任硬件行为的位置。Carburizer在Linux驱动程序中发现了近1000个这样的错误，误报率低于8%。借助影子驱动程序的恢复，Carburizer可以自动修复840这些错误，而无需程序员的参与。为了便于主动管理设备故障，Carburizer还可以定位检测设备故障的现有驱动程序代码，并插入缺失的故障报告代码。最后，Carburizer运行时可以检测和容忍中断相关的错误，例如卡住或丢失中断。

{"title":"Tolerating hardware device failures in software","authors":"Asim Kadav, Matthew J. Renzelmann, M. Swift","doi":"10.1145/1629575.1629582","DOIUrl":"https://doi.org/10.1145/1629575.1629582","url":null,"abstract":"Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While major operating system and device vendors recommend that drivers detect and recover from hardware failures, we find that there are many drivers that will crash or hang when a device fails. Such bugs cannot easily be detected by regular stress testing because the failures are induced by the device and not the software load. This paper describes Carburizer, a code-manipulation tool and associated runtime that improves system reliability in the presence of faulty devices. Carburizer analyzes driver source code to find locations where the driver incorrectly trusts the hardware to behave. Carburizer identified almost 1000 such bugs in Linux drivers with a false positive rate of less than 8 percent. With the aid of shadow drivers for recovery, Carburizer can automatically repair 840 of these bugs with no programmer involvement. To facilitate proactive management of device failures, Carburizer can also locate existing driver code that detects device failures and inserts missing failure-reporting code. Finally, the Carburizer runtime can detect and tolerate interrupt-related bugs, such as stuck or missing interrupts.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"3 1","pages":"59-72"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82166419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

RouteBricks: exploiting parallelism to scale software routers RouteBricks:利用并行性来扩展软件路由器

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629578

Mihai Dobrescu, Norbert Egi, K. Argyraki, Byung-Gon Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, S. Ratnasamy

We revisit the problem of scaling software routers, motivated by recent advances in server technology that enable high-speed parallel processing--a feature router workloads appear ideally suited to exploit. We propose a software router architecture that parallelizes router functionality both across multiple servers and across multiple cores within a single server. By carefully exploiting parallelism at every opportunity, we demonstrate a 35Gbps parallel router prototype; this router capacity can be linearly scaled through the use of additional servers. Our prototype router is fully programmable using the familiar Click/Linux environment and is built entirely from off-the-shelf, general-purpose server hardware.

我们重新审视了扩展软件路由器的问题，其动机是服务器技术的最新进展，该技术支持高速并行处理——路由器工作负载似乎非常适合利用这一特性。我们提出了一种软件路由器架构，它可以跨多个服务器和单个服务器内的多个核心并行化路由器功能。通过仔细利用每个机会的并行性，我们展示了一个35Gbps并行路由器原型;这个路由器的容量可以通过使用额外的服务器进行线性扩展。我们的原型路由器使用熟悉的Click/Linux环境完全可编程，并且完全由现成的通用服务器硬件构建。

引用次数: 554

Helios: heterogeneous multiprocessing with satellite kernels Helios:带有卫星内核的异构多处理

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629597

Edmund B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, G. Hunt

Helios is an operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms. Helios introduces satellite kernels, which export a single, uniform set of OS abstractions across CPUs of disparate architectures and performance characteristics. Access to I/O services such as file systems are made transparent via remote message passing, which extends a standard microkernel message-passing abstraction to a satellite kernel infrastructure. Helios retargets applications to available ISAs by compiling from an intermediate language. To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. Affinity provides a hint to the operating system about whether a process would benefit from executing on the same platform as a service it depends upon. We developed satellite kernels for an XScale programmable I/O card and for cache-coherent NUMA architectures. We offloaded several applications and operating system components, often by changing only a single line of metadata. We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. On a mail-server benchmark, we show a 39% improvement in performance by automatically splitting the application among multiple NUMA domains.

Helios是一个操作系统，旨在简化针对异构平台编写、部署和调优应用程序的任务。Helios引入了卫星内核，它跨不同架构和性能特征的cpu导出一组统一的操作系统抽象。通过远程消息传递使对文件系统等I/O服务的访问变得透明，远程消息传递将标准的微内核消息传递抽象扩展到卫星内核基础结构。Helios通过编译中间语言将应用程序重新定位到可用的isa。为了简化应用程序性能的部署和调优，Helios向开发人员公开了一个关联度量。亲和性向操作系统提供了一个提示，告诉它一个进程是否会从作为它所依赖的服务在同一平台上执行中获益。我们为XScale可编程I/O卡和缓存一致的NUMA架构开发了卫星内核。我们卸载了几个应用程序和操作系统组件，通常只更改了一行元数据。通过将任务卸载到XScale I/O卡，我们的性能提高了28%。在邮件服务器基准测试中，通过在多个NUMA域中自动拆分应用程序，我们显示性能提高了39%。

{"title":"Helios: heterogeneous multiprocessing with satellite kernels","authors":"Edmund B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, G. Hunt","doi":"10.1145/1629575.1629597","DOIUrl":"https://doi.org/10.1145/1629575.1629597","url":null,"abstract":"Helios is an operating system designed to simplify the task of writing, deploying, and tuning applications for heterogeneous platforms. Helios introduces satellite kernels, which export a single, uniform set of OS abstractions across CPUs of disparate architectures and performance characteristics. Access to I/O services such as file systems are made transparent via remote message passing, which extends a standard microkernel message-passing abstraction to a satellite kernel infrastructure. Helios retargets applications to available ISAs by compiling from an intermediate language. To simplify deploying and tuning application performance, Helios exposes an affinity metric to developers. Affinity provides a hint to the operating system about whether a process would benefit from executing on the same platform as a service it depends upon.\u0000 We developed satellite kernels for an XScale programmable I/O card and for cache-coherent NUMA architectures. We offloaded several applications and operating system components, often by changing only a single line of metadata. We show up to a 28% performance improvement by offloading tasks to the XScale I/O card. On a mail-server benchmark, we show a 39% improvement in performance by automatically splitting the application among multiple NUMA domains.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"138 1","pages":"221-234"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77446385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 230

Automatic device driver synthesis with termite 自动装置驱动合成与白蚁

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629583

L. Ryzhyk, P. Chubb, I. Kuz, Etienne Le Sueur, G. Heiser

Faulty device drivers cause significant damage through down time and data loss. The problem can be mitigated by an improved driver development process that guarantees correctness by construction. We achieve this by synthesising drivers automatically from formal specifications of device interfaces, thus reducing the impact of human error on driver reliability and potentially cutting down on development costs. We present a concrete driver synthesis approach and tool called Termite. We discuss the methodology, the technical and practical limitations of driver synthesis, and provide an evaluation of non-trivial drivers for Linux, generated using our tool. We show that the performance of the generated drivers is on par with the equivalent manually developed drivers. Furthermore, we demonstrate that device specifications can be reused across different operating systems by generating a driver for FreeBSD from the same specification as used for Linux.

设备驱动程序故障会导致停机和数据丢失。这个问题可以通过改进的驱动程序开发过程来缓解，该过程可以保证构造的正确性。我们通过从设备接口的正式规范自动合成驱动程序来实现这一点，从而减少了人为错误对驱动程序可靠性的影响，并潜在地降低了开发成本。我们提出了一种混凝土驱动器合成方法和工具，称为白蚁。我们讨论了驱动程序合成的方法、技术和实际限制，并对使用我们的工具生成的Linux非平凡驱动程序进行了评估。我们表明，生成的驱动程序的性能与手动开发的等效驱动程序相当。此外，我们证明了设备规范可以在不同的操作系统之间重用，通过从Linux使用的相同规范为FreeBSD生成驱动程序。

引用次数: 113

Heat-ray: combating identity snowball attacks using machinelearning, combinatorial optimization and attack graphs Heat-ray:使用机器学习、组合优化和攻击图来对抗身份雪球攻击

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629605

John Dunagan, A. Zheng, Daniel R. Simon

As computers have become ever more interconnected, the complexity of security configuration has exploded. Management tools have not kept pace, and we show that this has made identity snowball attacks into a critical danger. Identity snowball attacks leverage the users logged in to a first compromised host to launch additional attacks with those users' privileges on other hosts. To combat such attacks, we present Heat-ray, a system that combines machine learning, combinatorial optimization and attack graphs to scalably manage security configuration. Through evaluation on an organization with several hundred thousand users and machines, we show that Heat-ray allows IT administrators to reduce by 96% the number of machines that can be used to launch a large-scale identity snowball attack.

随着计算机的互联程度越来越高，安全配置的复杂性也呈爆炸式增长。管理工具没有跟上步伐，我们表明，这已经使身份雪球攻击成为一种严重的危险。身份雪球攻击利用登录到第一个受损主机的用户，利用这些用户的特权在其他主机上发起额外的攻击。为了打击此类攻击，我们提出了Heat-ray，这是一个结合了机器学习、组合优化和攻击图的系统，可扩展地管理安全配置。通过对一个拥有数十万用户和机器的组织的评估，我们表明Heat-ray可以使IT管理员减少96%的机器数量，这些机器可以用来发动大规模的身份滚雪球攻击。

引用次数: 33

Quincy: fair scheduling for distributed computing clusters Quincy:分布式计算集群公平调度

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629601

M. Isard, Vijayan Prabhakaran, J. Currey, Udi Wieder, Kunal Talwar, A. Goldberg

This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model. We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.

本文研究了应用程序数据存储在计算节点上的集群上并发作业的调度问题。在这种设置中，调度接近数据的计算对性能至关重要，这种设置越来越普遍，出现在MapReduce、Hadoop和Dryad等系统以及许多网格计算环境中。我们认为数据密集型计算受益于细粒度资源共享模型，该模型不同于大多数现有集群计算架构实现的较粗的半静态资源分配。在这种资源共享模型下，具有局部性和公平性约束的调度问题还没有得到广泛的研究。我们引入了一个强大而灵活的新框架，用于调度具有细粒度资源共享的并发分布式作业。将调度问题映射到图数据结构中，其中边权和容量对数据局域性、公平性和无饥饿性的竞争需求进行编码，标准求解器根据全局成本模型计算最优在线调度。我们在一个由几百台计算机组成的集群上评估了这个框架(我们称之为Quincy)的实现，使用了不同的数据和cpu密集型工作负载。我们根据现有的基于队列的算法评估Quincy，并为每个调度器实现几个策略，有或没有公平性约束。Quincy在请求公平性时获得了更好的公平性，同时大大提高了数据的局部性。在我们的实验中，跨集群传输的数据量减少了3.9倍，导致吞吐量增加了40%。

{"title":"Quincy: fair scheduling for distributed computing clusters","authors":"M. Isard, Vijayan Prabhakaran, J. Currey, Udi Wieder, Kunal Talwar, A. Goldberg","doi":"10.1145/1629575.1629601","DOIUrl":"https://doi.org/10.1145/1629575.1629601","url":null,"abstract":"This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model.\u0000 We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"46 1","pages":"261-276"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75583115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 959

Surviving sensor network software faults 幸存的传感器网络软件故障

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pub Date : 2009-10-11 DOI: 10.1145/1629575.1629598

Yang Chen, O. Gnawali, Maria A. Kazandjieva, P. Levis, J. Regehr

We describe Neutron, a version of the TinyOS operating system that efficiently recovers from memory safety bugs. Where existing schemes reboot an entire node on an error, Neutron's compiler and runtime extensions divide programs into recovery units and reboot only the faulting unit. The TinyOS kernel itself is a recovery unit: a kernel safety violation appears to applications as the processor being unavailable for 10-20 milliseconds. Neutron further minimizes safety violation cost by supporting "precious" state that persists across reboots. Application data, time synchronization state, and routing tables can all be declared as precious. Neutron's reboot sequence conservatively checks that precious state is not the source of a fault before preserving it. Together, recovery units and precious state allow Neutron to reduce a safety violation's cost to time synchronization by 94% and to a routing protocol by 99.5%. Neutron also protects applications from losing data. Neutron provides this recovery on the very limited resources of a tiny, low-power microcontroller.

我们介绍了Neutron，这是TinyOS操作系统的一个版本，可以有效地从内存安全漏洞中恢复。现有的方案在出现错误时重新启动整个节点，而Neutron的编译器和运行时扩展将程序划分为恢复单元，只重新启动故障单元。TinyOS内核本身是一个恢复单元:当处理器在10-20毫秒内不可用时，应用程序就会出现内核安全违规。Neutron通过支持在重启过程中持续存在的“珍贵”状态，进一步将安全违规成本降至最低。应用程序数据、时间同步状态和路由表都可以声明为宝贵的。Neutron的重启序列在保存珍贵的状态之前，会保守地检查它是不是故障的来源。恢复单元和珍贵状态使Neutron能够将违反安全的时间同步成本降低94%，将违反路由协议的成本降低99.5%。Neutron还可以防止应用程序丢失数据。Neutron在非常有限的微型低功耗微控制器资源上提供这种恢复。

引用次数: 59

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀