ACM Transactions on Computer Systems最新文献_第4页

Market mechanisms for managing datacenters with heterogeneous microarchitectures 管理具有异构微架构的数据中心的市场机制

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2014-02-01 DOI: 10.1145/2541258

Marisabel Guevara, Benjamin Lubin, Benjamin C. Lee

Specialization of datacenter resources brings performance and energy improvements in response to the growing scale and diversity of cloud applications. Yet heterogeneous hardware adds complexity and volatility to latency-sensitive applications. A resource allocation mechanism that leverages architectural principles can overcome both of these obstacles. We integrate research in heterogeneous architectures with recent advances in multi-agent systems. Embedding architectural insight into proxies that bid on behalf of applications, a market effectively allocates hardware to applications with diverse preferences and valuations. Exploring a space of heterogeneous datacenter configurations, which mix server-class Xeon and mobile-class Atom processors, we find an optimal heterogeneous balance that improves both welfare and energy-efficiency. We further design and evaluate twelve design points along the Xeon-to-Atom spectrum, and find that a mix of three processor architectures achieves a 12× reduction in response time violations relative to equal-power homogeneous systems.

数据中心资源的专业化带来了性能和能源的改进，以响应云应用程序不断增长的规模和多样性。然而，异构硬件增加了对延迟敏感的应用程序的复杂性和波动性。利用体系结构原则的资源分配机制可以克服这两个障碍。我们将异构体系结构的研究与多智能体系统的最新进展相结合。将体系结构洞察力嵌入到代表应用程序竞标的代理中，市场可以有效地将硬件分配给具有不同偏好和估值的应用程序。探索异构数据中心配置的空间，其中混合了服务器级Xeon和移动级Atom处理器，我们发现了一种优化的异构平衡，可以提高福利和能源效率。我们进一步设计和评估了Xeon-to-Atom光谱上的12个设计点，发现三种处理器架构的混合相对于等功率同构系统，在响应时间违反方面减少了12倍。

引用次数: 23

QoS-Aware scheduling in heterogeneous datacenters with paragon 基于paragon的异构数据中心qos感知调度

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2013-12-01 DOI: 10.1145/2556583

Christina Delimitrou, C. Kozyrakis

Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty of matching applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online, and do not scale beyond a few applications. We present Paragon, an online and scalable DC scheduler that is heterogeneity- and interference-aware. Paragon is derived from robust analytical methods, and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown incoming workload with respect to heterogeneity and interference in multiple shared resources. It does so by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. After the initial application placement, Paragon monitors application behavior and adjusts the scheduling decisions at runtime to avoid performance degradations. Additionally, we design ARQ, a multiclass admission control protocol that constrains application waiting time. ARQ queues applications in separate classes based on the type of resources they need and avoids long queueing delays for easy-to-satisfy workloads in highly-loaded scenarios. Paragon scales to tens of thousands of servers and applications with marginal scheduling overheads in terms of time or state. We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious, and least-loaded schedulers only provide similar guarantees for 14%, 11%, and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.

大型数据中心(dc)每天托管数以万计的各种应用程序。然而，托管工作负载之间的干扰以及将应用程序匹配到可用的众多硬件平台之一的困难可能会降低性能，违反许多云工作负载所需的服务质量(QoS)保证。虽然以前的工作已经确定了异构性和干扰的影响，但现有的解决方案是计算密集型的，不能在线应用，并且不能扩展到少数应用程序之外。我们提出了Paragon，一个在线和可扩展的数据中心调度程序，它具有异构和干扰意识。Paragon来源于健壮的分析方法，它不是详细分析每个应用程序，而是利用系统已经拥有的关于它以前看到的应用程序的信息。它采用协同过滤技术，针对多个共享资源的异构性和干扰性，快速准确地对未知传入工作负载进行分类。它通过识别与先前调度的应用程序的相似之处来实现这一点。这种分类允许Paragon以最小化干扰和最大化服务器利用率的方式贪婪地调度应用程序。在初始应用程序放置之后，Paragon监视应用程序行为，并在运行时调整调度决策，以避免性能下降。此外，我们还设计了ARQ协议，这是一个限制申请等待时间的多类准入控制协议。ARQ根据应用程序所需的资源类型在单独的类中排队，并避免长时间排队延迟，以满足高负载场景中易于满足的工作负载。Paragon可以扩展到数以万计的服务器和应用程序，在时间或状态方面的调度开销很小。我们对Paragon进行了广泛的工作负载场景评估，包括小型和大型系统，包括EC2上的1,000台服务器。对于2500个工作负载的场景，Paragon为91%的应用程序提供了性能保证，同时显著提高了利用率。相比之下，异构无关、干扰无关和最小负载调度器仅为14%、11%和3%的工作负载提供类似的保证。在资源效率更为关键的超额订阅场景中，差异更为显著。

{"title":"QoS-Aware scheduling in heterogeneous datacenters with paragon","authors":"Christina Delimitrou, C. Kozyrakis","doi":"10.1145/2556583","DOIUrl":"https://doi.org/10.1145/2556583","url":null,"abstract":"Large-scale datacenters (DCs) host tens of thousands of diverse applications each day. However, interference between colocated workloads and the difficulty of matching applications to one of the many hardware platforms available can degrade performance, violating the quality of service (QoS) guarantees that many cloud workloads require. While previous work has identified the impact of heterogeneity and interference, existing solutions are computationally intensive, cannot be applied online, and do not scale beyond a few applications.\u0000 We present Paragon, an online and scalable DC scheduler that is heterogeneity- and interference-aware. Paragon is derived from robust analytical methods, and instead of profiling each application in detail, it leverages information the system already has about applications it has previously seen. It uses collaborative filtering techniques to quickly and accurately classify an unknown incoming workload with respect to heterogeneity and interference in multiple shared resources. It does so by identifying similarities to previously scheduled applications. The classification allows Paragon to greedily schedule applications in a manner that minimizes interference and maximizes server utilization. After the initial application placement, Paragon monitors application behavior and adjusts the scheduling decisions at runtime to avoid performance degradations. Additionally, we design ARQ, a multiclass admission control protocol that constrains application waiting time. ARQ queues applications in separate classes based on the type of resources they need and avoids long queueing delays for easy-to-satisfy workloads in highly-loaded scenarios. Paragon scales to tens of thousands of servers and applications with marginal scheduling overheads in terms of time or state.\u0000 We evaluate Paragon with a wide range of workload scenarios, on both small and large-scale systems, including 1,000 servers on EC2. For a 2,500-workload scenario, Paragon enforces performance guarantees for 91% of applications, while significantly improving utilization. In comparison, heterogeneity-oblivious, interference-oblivious, and least-loaded schedulers only provide similar guarantees for 14%, 11%, and 3% of workloads. The differences are more striking in oversubscribed scenarios where resource efficiency is more critical.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"21 1","pages":"12"},"PeriodicalIF":1.5,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85967024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 153

CORFU: A distributed shared log CORFU:分布式共享日志

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2013-12-01 DOI: 10.1145/2535930

M. Balakrishnan, D. Malkhi, John D. Davis, Vijayan Prabhakaran, M. Wei, Ted Wobber

CORFU is a global log which clients can append-to and read-from over a network. Internally, CORFU is distributed over a cluster of machines in such a way that there is no single I/O bottleneck to either appends or reads. Data is fully replicated for fault tolerance, and a modest cluster of about 16--32 machines with SSD drives can sustain 1 million 4-KByte operations per second. The CORFU log enabled the construction of a variety of distributed applications that require strong consistency at high speeds, such as databases, transactional key-value stores, replicated state machines, and metadata services.

CORFU是一个全局日志，客户端可以在网络上添加和读取它。在内部，CORFU以这样一种方式分布在一个机器集群上，即不存在附加或读取的单个I/O瓶颈。为了容错，数据被完全复制，大约16—32台带有SSD驱动器的机器组成的适度集群可以维持每秒100万4-KByte的操作。CORFU日志支持构建各种需要高速高一致性的分布式应用程序，例如数据库、事务性键值存储、复制状态机和元数据服务。

引用次数: 81

Optimizing Storage Performance for VM-Based Mobile Computing 面向虚拟机移动计算的存储性能优化

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2013-05-01 DOI: 10.1145/2465346.2465348

Stephen Smaldone, Benjamin Gilbert, J. Harkes, L. Iftode, M. Satyanarayanan

This article investigates the transient use of free local storage for improving performance in VM-based mobile computing systems implemented as thick clients on host PCs. We use the term TransientPC systems to refer to these types of systems. The solution we propose, called TransPart, uses the higher-performing local storage of host hardware to speed up performance-critical operations. Our solution constructs a virtual storage device on demand (which we call transient storage) by borrowing free disk blocks from the host’s storage. In this article, we present the design, implementation, and evaluation of a TransPart prototype, which requires no modifications to the software or hardware of a host computer. Experimental results confirm that TransPart offers low overhead and startup cost, while improving user experience.

本文研究了在主机pc上作为厚客户机实现的基于vm的移动计算系统中，如何临时使用免费本地存储来提高性能。我们使用术语TransientPC系统来指代这些类型的系统。我们提出的解决方案称为TransPart，它使用主机硬件的高性能本地存储来加速对性能至关重要的操作。我们的解决方案通过从主机的存储中借用空闲磁盘块，按需构建一个虚拟存储设备(我们称之为瞬时存储)。在本文中，我们介绍了一个TransPart原型的设计、实现和评估，它不需要修改主机的软件或硬件。实验结果证实，在提高用户体验的同时，TransPart具有较低的开销和启动成本。

引用次数: 2

Parametric Content-Based Publish/Subscribe 参数化基于内容的发布/订阅

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2013-05-01 DOI: 10.1145/2465346.2465347

K. R. Jayaram, P. Eugster, C. Jayalath

Content-based publish/subscribe (CPS) is an appealing abstraction for building scalable distributed systems, e.g., message boards, intrusion detectors, or algorithmic stock trading platforms. Recently, CPS extensions have been proposed for location-based services like vehicular networks, mobile social networking, and so on. Although current CPS middleware systems are dynamic in the way they support the joining and leaving of publishers and subscribers, they fall short in supporting subscription adaptations. These are becoming increasingly important across many CPS applications. In algorithmic high frequency trading, for instance, stock price thresholds that are of interest to a trader change rapidly, and gains directly hinge on the reaction time to relevant fluctuations rather than fixed values. In location-aware applications, a subscription is a function of the subscriber location (e.g. GPS coordinates), which inherently changes during motion. The common solution for adapting a subscription consists of a resubscription, where a new subscription is issued and the superseded one canceled. This incurs substantial overhead in CPS middleware systems, and leads to missed or duplicated events during the transition. In this article, we explore the concept of parametric subscriptions for capturing subscription adaptations. We discuss desirable and feasible guarantees for corresponding support, and propose novel algorithms for updating routing mechanisms effectively and efficiently in classic decentralized CPS broker overlay networks. Compared to resubscriptions, our algorithms significantly improve the reaction time to subscription updates without hampering throughput or latency under high update rates. We also propose and evaluate approximation techniques to detect and mitigate pathological cases of high frequency subscription oscillations, which could significantly decrease the throughput of CPS systems thereby affecting other subscribers. We analyze the benefits of our support through implementations of our algorithms in two CPS systems, and by evaluating our algorithms on two different application scenarios.

基于内容的发布/订阅(CPS)对于构建可扩展的分布式系统(例如，留言板、入侵检测器或算法股票交易平台)是一个吸引人的抽象。最近，针对车载网络、移动社交网络等基于位置的服务提出了CPS扩展。尽管当前的CPS中间件系统在支持发布者和订阅者的加入和退出方面是动态的，但它们在支持订阅适应方面存在不足。这些在许多CPS应用程序中变得越来越重要。例如，在算法高频交易中，交易者感兴趣的股票价格阈值变化很快，收益直接取决于对相关波动的反应时间，而不是固定的值。在位置感知应用程序中，订阅是订阅者位置(例如GPS坐标)的函数，它在移动过程中固有地变化。调整订阅的常见解决方案包括重新订阅，其中发出新的订阅并取消被取代的订阅。这将在CPS中间件系统中产生大量开销，并导致在转换期间错过或重复事件。在本文中，我们将探讨参数订阅的概念，以获取订阅适应性。我们讨论了相应支持的理想和可行的保证，并提出了在经典的分散CPS代理覆盖网络中有效和高效地更新路由机制的新算法。与重新订阅相比，我们的算法显著提高了订阅更新的响应时间，而不会影响高更新率下的吞吐量或延迟。我们还提出并评估了近似技术，以检测和减轻高频订阅振荡的病理情况，这可能会显著降低CPS系统的吞吐量，从而影响其他用户。我们通过在两个CPS系统中实现我们的算法，并在两个不同的应用场景中评估我们的算法，来分析我们的支持的好处。

{"title":"Parametric Content-Based Publish/Subscribe","authors":"K. R. Jayaram, P. Eugster, C. Jayalath","doi":"10.1145/2465346.2465347","DOIUrl":"https://doi.org/10.1145/2465346.2465347","url":null,"abstract":"Content-based publish/subscribe (CPS) is an appealing abstraction for building scalable distributed systems, e.g., message boards, intrusion detectors, or algorithmic stock trading platforms. Recently, CPS extensions have been proposed for location-based services like vehicular networks, mobile social networking, and so on.\u0000 Although current CPS middleware systems are dynamic in the way they support the joining and leaving of publishers and subscribers, they fall short in supporting subscription adaptations. These are becoming increasingly important across many CPS applications. In algorithmic high frequency trading, for instance, stock price thresholds that are of interest to a trader change rapidly, and gains directly hinge on the reaction time to relevant fluctuations rather than fixed values. In location-aware applications, a subscription is a function of the subscriber location (e.g. GPS coordinates), which inherently changes during motion.\u0000 The common solution for adapting a subscription consists of a resubscription, where a new subscription is issued and the superseded one canceled. This incurs substantial overhead in CPS middleware systems, and leads to missed or duplicated events during the transition. In this article, we explore the concept of parametric subscriptions for capturing subscription adaptations. We discuss desirable and feasible guarantees for corresponding support, and propose novel algorithms for updating routing mechanisms effectively and efficiently in classic decentralized CPS broker overlay networks. Compared to resubscriptions, our algorithms significantly improve the reaction time to subscription updates without hampering throughput or latency under high update rates. We also propose and evaluate approximation techniques to detect and mitigate pathological cases of high frequency subscription oscillations, which could significantly decrease the throughput of CPS systems thereby affecting other subscribers.\u0000 We analyze the benefits of our support through implementations of our algorithms in two CPS systems, and by evaluating our algorithms on two different application scenarios.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"18 1","pages":"4"},"PeriodicalIF":1.5,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84012856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System TritonSort:一个平衡和节能的大型分拣系统

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2013-02-01 DOI: 10.1145/2427631.2427634

A. Rasmussen, G. Porter, Michael Conley, H. Madhyastha, Radhika Niranjan Mysore, A. Pucher, Amin Vahdat

We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100TB of input data spread across 832 disks in 52 nodes at a rate of 0.938TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 66% better in absolute performance and has over six times the per-node throughput of the previous record holder. When evaluated against the 100TB Indy JouleSort benchmark, TritonSort sorted 9703 records/Joule. In this article, we describe the hardware and software architecture necessary to operate TritonSort at this level of efficiency. Through careful management of system resources to ensure cross-resource balance, we are able to sort data at approximately 80% of the disks’ aggregate sequential write speed. We believe the work holds a number of lessons for balanced system design and for scale-out architectures in general. While many interesting systems are able to scale linearly with additional servers, per-server performance can lag behind per-server capacity by more than an order of magnitude. Bridging the gap between high scalability and high performance would enable either significantly less expensive systems that are able to do the same work or provide the ability to address significantly larger problem sets with the same infrastructure.

我们介绍TritonSort，一个高效，可扩展的分类系统。它被设计用于处理大型数据集，并以0.938TB/min的速率对分布在52个节点的832个磁盘上的多达100TB的输入数据进行了评估。在对年度Indy GraySort排序基准进行评估时，TritonSort的绝对性能提高了66%，每个节点的吞吐量是之前记录保持者的六倍多。当对100TB的Indy JouleSort基准进行评估时，TritonSort每焦耳排序9703条记录。在本文中，我们将描述以这种效率级别操作TritonSort所需的硬件和软件体系结构。通过仔细管理系统资源以确保跨资源平衡，我们能够以大约80%的磁盘总顺序写速度对数据进行排序。我们相信这项工作为平衡系统设计和横向扩展架构提供了许多经验教训。虽然许多有趣的系统能够随额外的服务器线性扩展，但每台服务器的性能可能落后于每台服务器的容量超过一个数量级。弥合高可伸缩性和高性能之间的差距，将使能够完成相同工作的系统成本大大降低，或者提供使用相同基础设施处理更大问题集的能力。

{"title":"TritonSort: A Balanced and Energy-Efficient Large-Scale Sorting System","authors":"A. Rasmussen, G. Porter, Michael Conley, H. Madhyastha, Radhika Niranjan Mysore, A. Pucher, Amin Vahdat","doi":"10.1145/2427631.2427634","DOIUrl":"https://doi.org/10.1145/2427631.2427634","url":null,"abstract":"We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100TB of input data spread across 832 disks in 52 nodes at a rate of 0.938TB/min. When evaluated against the annual Indy GraySort sorting benchmark, TritonSort is 66% better in absolute performance and has over six times the per-node throughput of the previous record holder. When evaluated against the 100TB Indy JouleSort benchmark, TritonSort sorted 9703 records/Joule. In this article, we describe the hardware and software architecture necessary to operate TritonSort at this level of efficiency. Through careful management of system resources to ensure cross-resource balance, we are able to sort data at approximately 80% of the disks’ aggregate sequential write speed.\u0000 We believe the work holds a number of lessons for balanced system design and for scale-out architectures in general. While many interesting systems are able to scale linearly with additional servers, per-server performance can lag behind per-server capacity by more than an order of magnitude. Bridging the gap between high scalability and high performance would enable either significantly less expensive systems that are able to do the same work or provide the ability to address significantly larger problem sets with the same infrastructure.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"10 1","pages":"3"},"PeriodicalIF":1.5,"publicationDate":"2013-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/2427631.2427634","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72527510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Aggressive Datacenter Power Provisioning with Batteries 积极的数据中心电池供电

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2013-02-01 DOI: 10.1145/2427631.2427633

Sriram Govindan, Di Wang, A. Sivasubramaniam, B. Urgaonkar

Datacenters spend $10--25 per watt in provisioning their power infrastructure, regardless of the watts actually consumed. Since peak power needs arise rarely, provisioning power infrastructure for them can be expensive. One can, thus, aggressively underprovision infrastructure assuming that simultaneous peak draw across all equipment will happen rarely. The resulting nonzero probability of emergency events where power needs exceed provisioned capacity, however small, mandates graceful reaction mechanisms to cap the power draw instead of leaving it to disruptive circuit breakers/fuses. Existing strategies for power capping use temporal knobs local to a server that throttle the rate of execution (using power modes), and/or spatial knobs that redirect/migrate excess load to regions of the datacenter with more power headroom. We show these mechanisms to have performance degrading ramifications, and propose an entirely orthogonal solution that leverages existing UPS batteries to temporarily augment the utility supply during emergencies. We build an experimental prototype to demonstrate such power capping on a cluster of 8 servers, each with an individual battery, and implement several online heuristics in the context of different datacenter workloads to evaluate their effectiveness in handling power emergencies. We show that our battery-based solution can: (i) handle emergencies of short durations on its own, (ii) supplement existing reaction mechanisms to enhance their efficacy for longer emergencies, and (iii) create more slack for shifting applications temporarily to nonpeak durations.

无论实际消耗多少瓦特，数据中心在配置电力基础设施上每瓦花费10- 25美元。由于峰值电力需求很少出现，因此为它们提供电力基础设施可能非常昂贵。因此，假设所有设备的同时峰值消耗很少发生，人们可以积极地减少基础设施的供应。当电力需求超过规定容量时，所产生的紧急事件的非零概率，无论多么小，都要求有优雅的反应机制来限制电力消耗，而不是将其留给破坏性断路器/保险丝。现有的功率封顶策略使用服务器本地的时间旋钮来限制执行速度(使用功率模式)，和/或空间旋钮将多余的负载重定向/迁移到具有更多功率剩余空间的数据中心区域。我们展示了这些机制有性能下降的后果，并提出了一个完全正交的解决方案，利用现有的UPS电池在紧急情况下暂时增加公用事业供应。我们构建了一个实验原型，在一个由8台服务器组成的集群上演示这种功率上限，每台服务器都有一个单独的电池，并在不同的数据中心工作负载上下文中实现了几个在线启发式方法，以评估它们在处理电源紧急情况方面的有效性。我们表明，我们基于电池的解决方案可以:(i)自行处理短时间的紧急情况，(ii)补充现有的反应机制，以提高其对较长时间紧急情况的有效性，以及(iii)为将应用暂时转移到非峰值持续时间创造更多的空闲。

{"title":"Aggressive Datacenter Power Provisioning with Batteries","authors":"Sriram Govindan, Di Wang, A. Sivasubramaniam, B. Urgaonkar","doi":"10.1145/2427631.2427633","DOIUrl":"https://doi.org/10.1145/2427631.2427633","url":null,"abstract":"Datacenters spend $10--25 per watt in provisioning their power infrastructure, regardless of the watts actually consumed. Since peak power needs arise rarely, provisioning power infrastructure for them can be expensive. One can, thus, aggressively underprovision infrastructure assuming that simultaneous peak draw across all equipment will happen rarely. The resulting nonzero probability of emergency events where power needs exceed provisioned capacity, however small, mandates graceful reaction mechanisms to cap the power draw instead of leaving it to disruptive circuit breakers/fuses. Existing strategies for power capping use temporal knobs local to a server that throttle the rate of execution (using power modes), and/or spatial knobs that redirect/migrate excess load to regions of the datacenter with more power headroom. We show these mechanisms to have performance degrading ramifications, and propose an entirely orthogonal solution that leverages existing UPS batteries to temporarily augment the utility supply during emergencies. We build an experimental prototype to demonstrate such power capping on a cluster of 8 servers, each with an individual battery, and implement several online heuristics in the context of different datacenter workloads to evaluate their effectiveness in handling power emergencies. We show that our battery-based solution can: (i) handle emergencies of short durations on its own, (ii) supplement existing reaction mechanisms to enhance their efficacy for longer emergencies, and (iii) create more slack for shifting applications temporarily to nonpeak durations.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"14 1","pages":"2"},"PeriodicalIF":1.5,"publicationDate":"2013-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86460420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Bringing Virtualization to the x86 Architecture with the Original VMware Workstation 使用原始VMware工作站将虚拟化带入x86架构

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2012-11-01 DOI: 10.1145/2382553.2382554

Edouard Bugnion, Scott Devine, M. Rosenblum, J. Sugerman, E. Wang

This article describes the historical context, technical challenges, and main implementation techniques used by VMware Workstation to bring virtualization to the x86 architecture in 1999. Although virtual machine monitors (VMMs) had been around for decades, they were traditionally designed as part of monolithic, single-vendor architectures with explicit support for virtualization. In contrast, the x86 architecture lacked virtualization support, and the industry around it had disaggregated into an ecosystem, with different vendors controlling the computers, CPUs, peripherals, operating systems, and applications, none of them asking for virtualization. We chose to build our solution independently of these vendors. As a result, VMware Workstation had to deal with new challenges associated with (i) the lack of virtualization support in the x86 architecture, (ii) the daunting complexity of the architecture itself, (iii) the need to support a broad combination of peripherals, and (iv) the need to offer a simple user experience within existing environments. These new challenges led us to a novel combination of well-known virtualization techniques, techniques from other domains, and new techniques. VMware Workstation combined a hosted architecture with a VMM. The hosted architecture enabled a simple user experience and offered broad hardware compatibility. Rather than exposing I/O diversity to the virtual machines, VMware Workstation also relied on software emulation of I/O devices. The VMM combined a trap-and-emulate direct execution engine with a system-level dynamic binary translator to efficiently virtualize the x86 architecture and support most commodity operating systems. By relying on x86 hardware segmentation as a protection mechanism, the binary translator could execute translated code at near hardware speeds. The binary translator also relied on partial evaluation and adaptive retranslation to reduce the overall overheads of virtualization. Written with the benefit of hindsight, this article shares the key lessons we learned from building the original system and from its later evolution.

本文描述了VMware Workstation在1999年将虚拟化引入x86架构时所使用的历史背景、技术挑战和主要实现技术。尽管虚拟机监视器(vmm)已经存在了几十年，但它们传统上被设计为单一供应商架构的一部分，并明确支持虚拟化。相比之下，x86架构缺乏虚拟化支持，围绕它的行业已经分解成一个生态系统，由不同的供应商控制计算机、cpu、外围设备、操作系统和应用程序，没有一个供应商要求虚拟化。我们选择独立于这些供应商构建我们的解决方案。因此，VMware Workstation必须应对以下方面的新挑战:(1)x86架构中缺乏虚拟化支持，(2)架构本身令人生畏的复杂性，(3)需要支持广泛的外设组合，(4)需要在现有环境中提供简单的用户体验。这些新的挑战促使我们将知名的虚拟化技术、来自其他领域的技术和新技术结合起来。VMware工作站结合了托管架构和VMM。托管架构支持简单的用户体验，并提供广泛的硬件兼容性。VMware Workstation还依赖于I/O设备的软件仿真，而不是将I/O多样性暴露给虚拟机。VMM将捕获和模拟直接执行引擎与系统级动态二进制转换器相结合，以有效地虚拟化x86架构并支持大多数商用操作系统。通过依赖x86硬件分段作为保护机制，二进制转换器可以以接近硬件的速度执行翻译后的代码。二进制翻译器还依赖于部分求值和自适应重新翻译来减少虚拟化的总体开销。本文将分享我们从构建原始系统及其后来的演变中学到的关键经验教训。

{"title":"Bringing Virtualization to the x86 Architecture with the Original VMware Workstation","authors":"Edouard Bugnion, Scott Devine, M. Rosenblum, J. Sugerman, E. Wang","doi":"10.1145/2382553.2382554","DOIUrl":"https://doi.org/10.1145/2382553.2382554","url":null,"abstract":"This article describes the historical context, technical challenges, and main implementation techniques used by VMware Workstation to bring virtualization to the x86 architecture in 1999. Although virtual machine monitors (VMMs) had been around for decades, they were traditionally designed as part of monolithic, single-vendor architectures with explicit support for virtualization. In contrast, the x86 architecture lacked virtualization support, and the industry around it had disaggregated into an ecosystem, with different vendors controlling the computers, CPUs, peripherals, operating systems, and applications, none of them asking for virtualization. We chose to build our solution independently of these vendors.\u0000 As a result, VMware Workstation had to deal with new challenges associated with (i) the lack of virtualization support in the x86 architecture, (ii) the daunting complexity of the architecture itself, (iii) the need to support a broad combination of peripherals, and (iv) the need to offer a simple user experience within existing environments. These new challenges led us to a novel combination of well-known virtualization techniques, techniques from other domains, and new techniques.\u0000 VMware Workstation combined a hosted architecture with a VMM. The hosted architecture enabled a simple user experience and offered broad hardware compatibility. Rather than exposing I/O diversity to the virtual machines, VMware Workstation also relied on software emulation of I/O devices. The VMM combined a trap-and-emulate direct execution engine with a system-level dynamic binary translator to efficiently virtualize the x86 architecture and support most commodity operating systems. By relying on x86 hardware segmentation as a protection mechanism, the binary translator could execute translated code at near hardware speeds. The binary translator also relied on partial evaluation and adaptive retranslation to reduce the overall overheads of virtualization.\u0000 Written with the benefit of hindsight, this article shares the key lessons we learned from building the original system and from its later evolution.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"34 1","pages":"12:1-12:51"},"PeriodicalIF":1.5,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90911112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 100

Fay: Extensible Distributed Tracing from Kernels to Clusters Fay:从内核到集群的可扩展分布式跟踪

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2012-11-01 DOI: 10.1145/2382553.2382555

Ú. Erlingsson, Marcus Peinado, Simon Peter, M. Budiu, Gloria Mainar-Ruiz

Fay is a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, including even untrusted, fully optimized machine code, and Fay can be applied to running user-mode or kernel-mode software without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events. We have implemented the Fay tracing platform for Windows and integrated it with two powerful, expressive systems for distributed programming. Our implementation is easy to use, can be applied to unmodified production systems, and provides primitives that allow the overhead of tracing to be greatly reduced, compared to previous dynamic tracing platforms. To show the generality of Fay tracing, we reimplement, in experiments, a range of tracing strategies and several custom mechanisms from existing tracing frameworks. Fay shows that modern techniques for high-level querying and data-parallel processing of disagreggated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Revisiting a lesson from the late 1960s [Deutsch and Grant 1971], Fay also demonstrates the efficiency and extensibility benefits of using safe, statically verified machine code as the basis for low-level execution tracing. Finally, Fay establishes that, by automatically deriving optimized query plans and code for safe extensions, the expressiveness and performance of high-level tracing queries can equal or even surpass that of specialized monitoring tools.

Fay是一个灵活的平台，用于有效地收集、处理和分析软件执行跟踪。Fay通过使用运行时检测和机器内和集群间的分布式聚合提供动态跟踪。在最低级别，Fay可以安全地扩展新的跟踪原语，甚至包括不可信的、完全优化的机器码，并且Fay可以应用于运行用户模式或内核模式软件，而不会影响系统稳定性。在最高级别，Fay提供了一种统一的声明性方法，用于指定要跟踪的事件，以及对这些事件的聚合、处理和分析。我们已经为Windows实现了Fay跟踪平台，并将其与两个强大的、富有表现力的分布式编程系统集成在一起。我们的实现易于使用，可以应用于未经修改的生产系统，并且与以前的动态跟踪平台相比，提供了允许大大减少跟踪开销的原语。为了展示Fay跟踪的通用性，我们在实验中重新实现了一系列跟踪策略和来自现有跟踪框架的几个自定义机制。Fay表明，用于高级查询和分解数据流的数据并行处理的现代技术非常适合于分布式系统中软件执行的全面监控。回顾20世纪60年代末的一个教训[Deutsch和Grant 1971]， Fay还展示了使用安全的、静态验证的机器码作为底层执行跟踪的基础所带来的效率和可扩展性的好处。最后，Fay指出，通过自动为安全扩展生成优化的查询计划和代码，高级跟踪查询的表现力和性能可以与专门的监视工具相媲美，甚至超过它们。

{"title":"Fay: Extensible Distributed Tracing from Kernels to Clusters","authors":"Ú. Erlingsson, Marcus Peinado, Simon Peter, M. Budiu, Gloria Mainar-Ruiz","doi":"10.1145/2382553.2382555","DOIUrl":"https://doi.org/10.1145/2382553.2382555","url":null,"abstract":"Fay is a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters. At the lowest level, Fay can be safely extended with new tracing primitives, including even untrusted, fully optimized machine code, and Fay can be applied to running user-mode or kernel-mode software without compromising system stability. At the highest level, Fay provides a unified, declarative means of specifying what events to trace, as well as the aggregation, processing, and analysis of those events.\u0000 We have implemented the Fay tracing platform for Windows and integrated it with two powerful, expressive systems for distributed programming. Our implementation is easy to use, can be applied to unmodified production systems, and provides primitives that allow the overhead of tracing to be greatly reduced, compared to previous dynamic tracing platforms. To show the generality of Fay tracing, we reimplement, in experiments, a range of tracing strategies and several custom mechanisms from existing tracing frameworks.\u0000 Fay shows that modern techniques for high-level querying and data-parallel processing of disagreggated data streams are well suited to comprehensive monitoring of software execution in distributed systems. Revisiting a lesson from the late 1960s [Deutsch and Grant 1971], Fay also demonstrates the efficiency and extensibility benefits of using safe, statically verified machine code as the basis for low-level execution tracing. Finally, Fay establishes that, by automatically deriving optimized query plans and code for safe extensions, the expressiveness and performance of high-level tracing queries can equal or even surpass that of specialized monitoring tools.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"60 1","pages":"13:1-13:35"},"PeriodicalIF":1.5,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90705185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers AutoScale:多层数据中心的动态、健壮的容量管理

IF 1.5 4区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Computer Systems

Pub Date : 2012-11-01 DOI: 10.1145/2382553.2382556

Anshul Gandhi, Mor Harchol-Balter, R. Raghunathan, M. Kozuch

Energy costs for data centers continue to rise, already exceeding $15 billion yearly. Sadly much of this power is wasted. Servers are only busy 10--30% of the time on average, but they are often left on, while idle, utilizing 60% or more of peak power when in the idle state. We introduce a dynamic capacity management policy, AutoScale, that greatly reduces the number of servers needed in data centers driven by unpredictable, time-varying load, while meeting response time SLAs. AutoScale scales the data center capacity, adding or removing servers as needed. AutoScale has two key features: (i) it autonomically maintains just the right amount of spare capacity to handle bursts in the request rate; and (ii) it is robust not just to changes in the request rate of real-world traces, but also request size and server efficiency. We evaluate our dynamic capacity management approach via implementation on a 38-server multi-tier data center, serving a web site of the type seen in Facebook or Amazon, with a key-value store workload. We demonstrate that AutoScale vastly improves upon existing dynamic capacity management policies with respect to meeting SLAs and robustness.

数据中心的能源成本持续上升，每年已经超过150亿美元。可悲的是，这种力量有很大一部分被浪费了。服务器平均只有10- 30%的时间处于繁忙状态，但它们在空闲状态下经常处于开启状态，在空闲状态下使用峰值功率的60%或更多。我们引入了一种动态容量管理策略AutoScale，它可以在满足响应时间sla的同时，大大减少由不可预测的时变负载驱动的数据中心所需的服务器数量。AutoScale可扩展数据中心容量，根据需要添加或删除服务器。AutoScale有两个关键特性:(i)它自动维护刚好合适的备用容量来处理突发的请求率;并且(ii)它不仅对真实跟踪的请求速率的变化具有鲁棒性，而且对请求大小和服务器效率也具有鲁棒性。我们通过在38台服务器的多层数据中心上的实现来评估动态容量管理方法，该数据中心服务于Facebook或Amazon中看到的类型的网站，具有键值存储工作负载。我们证明了AutoScale在满足sla和健壮性方面大大改进了现有的动态容量管理策略。

{"title":"AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers","authors":"Anshul Gandhi, Mor Harchol-Balter, R. Raghunathan, M. Kozuch","doi":"10.1145/2382553.2382556","DOIUrl":"https://doi.org/10.1145/2382553.2382556","url":null,"abstract":"Energy costs for data centers continue to rise, already exceeding $15 billion yearly. Sadly much of this power is wasted. Servers are only busy 10--30% of the time on average, but they are often left on, while idle, utilizing 60% or more of peak power when in the idle state.\u0000 We introduce a dynamic capacity management policy, AutoScale, that greatly reduces the number of servers needed in data centers driven by unpredictable, time-varying load, while meeting response time SLAs. AutoScale scales the data center capacity, adding or removing servers as needed. AutoScale has two key features: (i) it autonomically maintains just the right amount of spare capacity to handle bursts in the request rate; and (ii) it is robust not just to changes in the request rate of real-world traces, but also request size and server efficiency.\u0000 We evaluate our dynamic capacity management approach via implementation on a 38-server multi-tier data center, serving a web site of the type seen in Facebook or Amazon, with a key-value store workload. We demonstrate that AutoScale vastly improves upon existing dynamic capacity management policies with respect to meeting SLAs and robustness.","PeriodicalId":50918,"journal":{"name":"ACM Transactions on Computer Systems","volume":"4 1","pages":"14:1-14:26"},"PeriodicalIF":1.5,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73616765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 314