首页 > 最新文献

ACM SIGOPS Oper. Syst. Rev.最新文献

英文 中文
Weir: a streaming language for performance analysis Weir:用于性能分析的流语言
Pub Date : 2014-05-15 DOI: 10.1145/2626401.2626415
A. Burtsev, Nikhil Mishrikoti, E. Eide, R. Ricci
For modern software systems, performance analysis can be a challenging task. The software stack can be a complex, multi-layer, multi-component, concurrent, and parallel environment with multiple contexts of execution and multiple sources of performance data. Although much performance data is available, because modern systems incorporate many mature data-collection mechanisms, analysis algorithms suffer from the lack of a unifying programming environment for processing the collected performance data, potentially from multiple sources, in a convenient and script-like manner. This paper presents Weir, a streaming language for systems performance analysis. Weir is based on the insight that performanceanalysis algorithms can be naturally expressed as stream-processing pipelines. In Weir, an analysis algorithm is implemented as a graph composed of stages, where each stage operates on a stream of events that represent collected performance measurements. Weir is an imperative streaming language with a syntax designed for the convenient construction of stream pipelines that utilize composable and reusable analysis stages. To demonstrate practical application, this paper presents the authors' experience in using Weir to analyze performance in systems based on the Xen virtualization platform.
对于现代软件系统,性能分析可能是一项具有挑战性的任务。软件栈可以是一个复杂的、多层的、多组件的、并发的、并行的环境,具有多个执行上下文和多个性能数据源。虽然有很多性能数据可用,但由于现代系统包含许多成熟的数据收集机制,分析算法缺乏统一的编程环境,无法以方便的、类似脚本的方式处理收集到的性能数据,这些数据可能来自多个来源。本文提出了一种用于系统性能分析的流语言Weir。Weir基于性能分析算法可以自然地表达为流处理管道的洞察力。在Weir中,分析算法被实现为一个由阶段组成的图,其中每个阶段都在代表收集到的性能测量的事件流上操作。Weir是一种命令式流语言,其语法旨在方便地构建利用可组合和可重用分析阶段的流管道。为了演示实际应用,本文介绍了作者使用Weir分析基于Xen虚拟化平台的系统性能的经验。
{"title":"Weir: a streaming language for performance analysis","authors":"A. Burtsev, Nikhil Mishrikoti, E. Eide, R. Ricci","doi":"10.1145/2626401.2626415","DOIUrl":"https://doi.org/10.1145/2626401.2626415","url":null,"abstract":"For modern software systems, performance analysis can be a challenging task. The software stack can be a complex, multi-layer, multi-component, concurrent, and parallel environment with multiple contexts of execution and multiple sources of performance data. Although much performance data is available, because modern systems incorporate many mature data-collection mechanisms, analysis algorithms suffer from the lack of a unifying programming environment for processing the collected performance data, potentially from multiple sources, in a convenient and script-like manner.\u0000 This paper presents Weir, a streaming language for systems performance analysis. Weir is based on the insight that performanceanalysis algorithms can be naturally expressed as stream-processing pipelines. In Weir, an analysis algorithm is implemented as a graph composed of stages, where each stage operates on a stream of events that represent collected performance measurements. Weir is an imperative streaming language with a syntax designed for the convenient construction of stream pipelines that utilize composable and reusable analysis stages. To demonstrate practical application, this paper presents the authors' experience in using Weir to analyze performance in systems based on the Xen virtualization platform.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"32 1","pages":"65-70"},"PeriodicalIF":0.0,"publicationDate":"2014-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73986169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
File systems deserve verification too! 文件系统也需要验证!
Pub Date : 2014-05-15 DOI: 10.1145/2626401.2626414
G. Keller, Toby C. Murray, Sidney Amani, Liam O'Connor, Zilin Chen, L. Ryzhyk, G. Klein, G. Heiser
File systems are too important, and current ones are too buggy, to remain unverified. Yet the most successful verification methods for functional correctness remain too expensive for current file system implementations-we need verified correctness but at reasonable cost. This paper presents our vision and ongoing work to achieve this goal for a new high-performance flash file system, called BilbyFs. BilbyFs is carefully designed to be highly modular, so it can be verified against a high-level functional specification one component at a time. This modular implementation is captured in a set of domain specific languages from which we produce the design-level specification, as well as its optimised C implementation. Importantly, we also automatically generate the proof linking these two artefacts. The combination of these features dramatically reduces verification effort. Verified file systems are now within reach for the first time.
文件系统太重要了,而当前的文件系统又漏洞百出,不能一直不进行验证。然而,对于当前的文件系统实现来说,最成功的功能正确性验证方法仍然过于昂贵——我们需要经过验证的正确性,但成本合理。本文介绍了我们的愿景和正在进行的工作,以实现这一目标的一个新的高性能闪存文件系统,称为BilbyFs。BilbyFs被精心设计为高度模块化,因此它可以根据高级功能规范一次一个组件进行验证。这个模块化的实现是用一组领域特定的语言捕获的,我们从这些语言中产生设计级规范,以及它的优化的C实现。重要的是,我们还自动生成链接这两个工件的证明。这些特性的组合极大地减少了验证工作。经过验证的文件系统现在第一次触手可及。
{"title":"File systems deserve verification too!","authors":"G. Keller, Toby C. Murray, Sidney Amani, Liam O'Connor, Zilin Chen, L. Ryzhyk, G. Klein, G. Heiser","doi":"10.1145/2626401.2626414","DOIUrl":"https://doi.org/10.1145/2626401.2626414","url":null,"abstract":"File systems are too important, and current ones are too buggy, to remain unverified. Yet the most successful verification methods for functional correctness remain too expensive for current file system implementations-we need verified correctness but at reasonable cost. This paper presents our vision and ongoing work to achieve this goal for a new high-performance flash file system, called BilbyFs. BilbyFs is carefully designed to be highly modular, so it can be verified against a high-level functional specification one component at a time. This modular implementation is captured in a set of domain specific languages from which we produce the design-level specification, as well as its optimised C implementation. Importantly, we also automatically generate the proof linking these two artefacts. The combination of these features dramatically reduces verification effort. Verified file systems are now within reach for the first time.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"39 1","pages":"58-64"},"PeriodicalIF":0.0,"publicationDate":"2014-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87033203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Coordinating multiple administration loops using discrete control 使用离散控制协调多个管理循环
Pub Date : 2013-11-26 DOI: 10.1145/2553070.2553074
Soguy Mak Karé Gueye, N. D. Palma, É. Rutten, A. Tchana
The increasing complexity of computer systems has led to the automation of administration functions, in the form of autonomic managers. One important aspect requiring such management is the issue of energy consumption of computing systems, in the perspective of green computing. As these managers address each a specific aspect, there is a need for using several managers to cover all the domains of administration. However, coordinating them is necessary for proper and effective global administration. Such coordination is a problem of synchronization and logical control of administration operations that can be applied by autonomous managers on the managed system at a given time in response to events observed on the state of this system. We therefore propose to investigate the use of reactive models with events and states, and discrete control techniques to solve this problem. In this paper, we illustrate this approach by integrating a controller obtained by synchronous programming, based on Discrete Controller Synthesis, in an autonomic system administration infrastructure. The role of this controller is to orchestrate the execution of reconfiguration operations of all administration policies to satisfy properties of logical consistency. We apply this approach to coordinate three managers : two energy-aware ones, which control server provisioning and processor frequency, and a repair manager.
计算机系统的日益复杂导致了管理功能的自动化,以自主管理的形式出现。从绿色计算的角度来看,需要这种管理的一个重要方面是计算系统的能源消耗问题。由于这些管理器处理每个特定方面,因此需要使用几个管理器来覆盖所有管理领域。然而,协调它们对于适当和有效的全球管理是必要的。这种协调是管理操作的同步和逻辑控制问题,可以由自治管理人员在给定时间应用于被管理系统,以响应在该系统状态上观察到的事件。因此,我们建议研究使用具有事件和状态的反应模型,以及离散控制技术来解决这个问题。在本文中,我们通过将基于离散控制器综合的同步编程获得的控制器集成到自治系统管理基础结构中来说明这种方法。此控制器的作用是编排所有管理策略的重新配置操作的执行,以满足逻辑一致性的属性。我们应用此方法来协调三个管理器:两个能量感知管理器,控制服务器供应和处理器频率,以及一个维修管理器。
{"title":"Coordinating multiple administration loops using discrete control","authors":"Soguy Mak Karé Gueye, N. D. Palma, É. Rutten, A. Tchana","doi":"10.1145/2553070.2553074","DOIUrl":"https://doi.org/10.1145/2553070.2553074","url":null,"abstract":"The increasing complexity of computer systems has led to the automation of administration functions, in the form of autonomic managers. One important aspect requiring such management is the issue of energy consumption of computing systems, in the perspective of green computing. As these managers address each a specific aspect, there is a need for using several managers to cover all the domains of administration. However, coordinating them is necessary for proper and effective global administration. Such coordination is a problem of synchronization and logical control of administration operations that can be applied by autonomous managers on the managed system at a given time in response to events observed on the state of this system. We therefore propose to investigate the use of reactive models with events and states, and discrete control techniques to solve this problem. In this paper, we illustrate this approach by integrating a controller obtained by synchronous programming, based on Discrete Controller Synthesis, in an autonomic system administration infrastructure. The role of this controller is to orchestrate the execution of reconfiguration operations of all administration policies to satisfy properties of logical consistency. We apply this approach to coordinate three managers : two energy-aware ones, which control server provisioning and processor frequency, and a repair manager.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"28 1","pages":"18-25"},"PeriodicalIF":0.0,"publicationDate":"2013-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80326699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An energy-efficient self-provisioning approach for cloud resources management 一种用于云资源管理的节能自配置方法
Pub Date : 2013-11-26 DOI: 10.1145/2553070.2553072
Hanen Chihi, Walid Chainbi, K. Ghédira
In recent years, energy conservation has become a major issue in information technology. Cloud computing is an emerging model for distributed utility computing and is being considered as an attractive opportunity for saving energy through central management of computational resources. Obviously, a substantial reduction in energy consumption can be made by powering down servers when they are not in use. This work presents a resources provisioning approach based on an unsupervised predictor model in the form of an unsupervised, recurrent neural network based on a self-organizing map. Another unique feature of our work is a resources administration strategy for energy saving in the cloud. Such a strategy is implemented as a selfadministration module. We show that the proposed approach gives promising results.
近年来,节能已成为信息技术领域的一大课题。云计算是分布式效用计算的一种新兴模型,被认为是通过集中管理计算资源来节约能源的一个极具吸引力的机会。显然,可以通过在服务器不使用时关闭电源来大幅降低能耗。这项工作提出了一种基于无监督预测模型的资源配置方法,该模型采用基于自组织映射的无监督、循环神经网络的形式。我们工作的另一个独特之处是在云中节约能源的资源管理策略。这种策略是作为自我管理模块实现的。我们表明,所提出的方法给出了有希望的结果。
{"title":"An energy-efficient self-provisioning approach for cloud resources management","authors":"Hanen Chihi, Walid Chainbi, K. Ghédira","doi":"10.1145/2553070.2553072","DOIUrl":"https://doi.org/10.1145/2553070.2553072","url":null,"abstract":"In recent years, energy conservation has become a major issue in information technology. Cloud computing is an emerging model for distributed utility computing and is being considered as an attractive opportunity for saving energy through central management of computational resources. Obviously, a substantial reduction in energy consumption can be made by powering down servers when they are not in use. This work presents a resources provisioning approach based on an unsupervised predictor model in the form of an unsupervised, recurrent neural network based on a self-organizing map. Another unique feature of our work is a resources administration strategy for energy saving in the cloud. Such a strategy is implemented as a selfadministration module. We show that the proposed approach gives promising results.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"123 1","pages":"2-9"},"PeriodicalIF":0.0,"publicationDate":"2013-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85683279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Reliability aware dynamic voltage and frequency scaling for improved microprocessor lifetime 提高微处理器寿命的动态电压和频率可调可靠性
Pub Date : 2013-11-26 DOI: 10.1145/2553070.2553073
Naga Pavan Kumar Gorti, Arun Kumar Somani
Dynamic voltage and frequency scaling (DVFS) is heavily used for power management in real-time environments. Although the schemes leveraging DVFS provide significant power reduction, adverse effects on chip reliability are possible. Alternate increase and decrease in operating voltage and frequency leads to thermal cycling. Increasing transistor packing density leads to a larger range of possible operating temperatures, exacerbating the thermal cycling problem. Also, the chip reliability quantification process does not include and represent the effects of small scale thermal cycles. A good number of in-field chip failures are attributed to the consequences of these. Thus, it is imperative to include their effects into the processor voltage and frequency selection process. Our work develops an integrated processor thermal and performance management technique centered on novel polynomial time scheduling algorithms that lead to lowering of thermal cycles in soft real time environments. Our technique leverages application awareness and runtime monitoring for improving chip lifetime, while achieving considerable energy savings. We show that a significant reduction in thermal cycles and peaks is possible, leading to longer chip life expectations.
动态电压和频率缩放(DVFS)被广泛用于实时环境中的电源管理。虽然利用DVFS的方案提供了显著的功耗降低,但对芯片可靠性的不利影响是可能的。工作电压和频率的交替增减导致热循环。增加晶体管封装密度导致更大的可能工作温度范围,加剧了热循环问题。此外,芯片可靠性的量化过程不包括和代表小规模热循环的影响。许多现场芯片故障都归因于这些后果。因此,必须在处理器电压和频率选择过程中考虑它们的影响。我们的工作开发了一种集成的处理器热和性能管理技术,该技术以新颖的多项式时间调度算法为中心,可降低软实时环境中的热循环。我们的技术利用应用程序感知和运行时监控来提高芯片寿命,同时实现相当大的能源节约。我们表明,热循环和峰值的显著减少是可能的,从而导致更长的芯片寿命预期。
{"title":"Reliability aware dynamic voltage and frequency scaling for improved microprocessor lifetime","authors":"Naga Pavan Kumar Gorti, Arun Kumar Somani","doi":"10.1145/2553070.2553073","DOIUrl":"https://doi.org/10.1145/2553070.2553073","url":null,"abstract":"Dynamic voltage and frequency scaling (DVFS) is heavily used for power management in real-time environments. Although the schemes leveraging DVFS provide significant power reduction, adverse effects on chip reliability are possible. Alternate increase and decrease in operating voltage and frequency leads to thermal cycling. Increasing transistor packing density leads to a larger range of possible operating temperatures, exacerbating the thermal cycling problem. Also, the chip reliability quantification process does not include and represent the effects of small scale thermal cycles. A good number of in-field chip failures are attributed to the consequences of these. Thus, it is imperative to include their effects into the processor voltage and frequency selection process. Our work develops an integrated processor thermal and performance management technique centered on novel polynomial time scheduling algorithms that lead to lowering of thermal cycles in soft real time environments. Our technique leverages application awareness and runtime monitoring for improving chip lifetime, while achieving considerable energy savings. We show that a significant reduction in thermal cycles and peaks is possible, leading to longer chip life expectations.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"1 1 1","pages":"10-17"},"PeriodicalIF":0.0,"publicationDate":"2013-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89790229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A study on micro level traffic prediction for energy-aware routers 能量感知路由器的微级流量预测研究
Pub Date : 2013-11-26 DOI: 10.1145/2553070.2553075
Sou Koyano, S. Ata, H. Iwamoto, Yuji Yano, Y. Kuroda, K. Inoue, I. Oka
For green networking, Sliced Router Architecture was proposed, which controls the power consumption of routers by adjusting the routers' performance on the basis of the volume of traffic. In this architecture, traffic prediction is used for appropriate power control of router. For obtaining the efficient gain of power reduction, we need to consider the impact of overestimation or underestimation. In this paper, we propose a traffic prediction method by considering the impact of overestimate and underestimate on power efficiency and processing performance of Sliced Router Architecture. We evaluate our method by trace-driven simulations with real traffic, we show that our approach can control the power consumption of Sliced Router without significant performance degradation.
针对绿色组网,提出了切片路由器架构(slicing Router Architecture),该架构通过根据业务量调整路由器的性能来控制路由器的功耗。在这种架构中,流量预测被用来对路由器进行适当的功率控制。为了获得有效的降功耗增益,我们需要考虑高估或低估的影响。在本文中,我们提出了一种考虑过高估计和过低估计对切片路由器架构的功率效率和处理性能影响的流量预测方法。我们通过跟踪驱动的真实流量模拟来评估我们的方法,我们表明我们的方法可以控制切片路由器的功耗而不会显着降低性能。
{"title":"A study on micro level traffic prediction for energy-aware routers","authors":"Sou Koyano, S. Ata, H. Iwamoto, Yuji Yano, Y. Kuroda, K. Inoue, I. Oka","doi":"10.1145/2553070.2553075","DOIUrl":"https://doi.org/10.1145/2553070.2553075","url":null,"abstract":"For green networking, Sliced Router Architecture was proposed, which controls the power consumption of routers by adjusting the routers' performance on the basis of the volume of traffic. In this architecture, traffic prediction is used for appropriate power control of router. For obtaining the efficient gain of power reduction, we need to consider the impact of overestimation or underestimation. In this paper, we propose a traffic prediction method by considering the impact of overestimate and underestimate on power efficiency and processing performance of Sliced Router Architecture. We evaluate our method by trace-driven simulations with real traffic, we show that our approach can control the power consumption of Sliced Router without significant performance degradation.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"16 1","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2013-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85159611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A review of energy measurement approaches 能量测量方法综述
Pub Date : 2013-11-26 DOI: 10.1145/2553070.2553077
Adel Noureddine, Romain Rouvoy, L. Seinturier
Reducing the energy footprint of digital devices and software is a task challenging the research in Green IT. Researches have proposed approaches for energy management, ranging from reducing usage of software and hardware, compilators optimization, to server consolidation and software migration. However, optimizing the energy consumption requires knowledge of that said consumption. In particular, measuring the energy consumption of hardware and software is an important requirement for efficient energy strategies. In this review, we outline the different categories of approaches in energy measurements, and provide insights into example of each category. We draw recommendations from our review on requirements on how to efficiently measure energy consumption of devices and software.
减少数字设备和软件的能源足迹是绿色信息技术研究的一个挑战。研究人员提出了能源管理的方法,包括减少软件和硬件的使用、优化编译器、服务器整合和软件迁移。然而,优化能源消耗需要了解所述消耗。特别是,测量硬件和软件的能源消耗是高效能源策略的重要要求。在这篇综述中,我们概述了能源测量的不同类别的方法,并提供了对每个类别的例子的见解。我们从对如何有效测量设备和软件能耗的要求的审查中得出建议。
{"title":"A review of energy measurement approaches","authors":"Adel Noureddine, Romain Rouvoy, L. Seinturier","doi":"10.1145/2553070.2553077","DOIUrl":"https://doi.org/10.1145/2553070.2553077","url":null,"abstract":"Reducing the energy footprint of digital devices and software is a task challenging the research in Green IT. Researches have proposed approaches for energy management, ranging from reducing usage of software and hardware, compilators optimization, to server consolidation and software migration. However, optimizing the energy consumption requires knowledge of that said consumption. In particular, measuring the energy consumption of hardware and software is an important requirement for efficient energy strategies. In this review, we outline the different categories of approaches in energy measurements, and provide insights into example of each category. We draw recommendations from our review on requirements on how to efficiently measure energy consumption of devices and software.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"43 1","pages":"42-49"},"PeriodicalIF":0.0,"publicationDate":"2013-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91394522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Performance troubleshooting in data centers: an annotated bibliography? 数据中心的性能故障排除:带注释的参考书目?
Pub Date : 2013-11-26 DOI: 10.1145/2553070.2553079
Chengwei Wang, Soila Kavulya, Jiaqi Tan, Liting Hu, Mahendra Kutare, Michael P. Kasick, K. Schwan, P. Narasimhan, R. Gandhi
In the emerging cloud computing era, enterprise data centers host a plethora of web services and applications, including those for e-Commerce, distributed multimedia, and social networks, which jointly, serve many aspects of our daily lives and business. For such applications, lack of availability, reliability, or responsiveness can lead to extensive losses. For instance, on June 29 2010, Amazon.com experienced three hours of intermittent performance problems as the normally reliable website took minutes to load items, and searches came back without product links. Customers were also unable to place orders. Based on their 2010 quarterly revenues, such downtime could cost Amazon up to $1.75 million per hour, thus making rapid problem resolution critical to its business. In another serious incident, on July 7, 2010, DBS bank in Singapore suffered a 7-hour outage which crippled its Internet banking systems, and disrupted other consumer banking services, including automated teller machines, credit card and NETS payments. The cascading failure occurred due to a procedural error while replacing a faulty component in one of the bank’s storage systems that was connected to its main computers. The high-cost of downtime in large-scale distributed systems drives the need for troubleshooting tools that can quickly detect problems and point system administrators to potential solutions. The increasing size and complexity of enterprise applications, coupled with the large scale of data centers in which they operate, make troubleshooting extremely challenging. Problems can arise due to a large variety of root-causes because of the complex interactions between hardware and software systems. The large volume of monitoring data available in these systems can obscure the root-cause of these problems. Lastly, the multi-tier nature of applications composed of entirely different subsystems man-
在新兴的云计算时代,企业数据中心托管着大量的web服务和应用程序,包括用于电子商务、分布式多媒体和社交网络的web服务和应用程序,它们共同为我们日常生活和业务的许多方面提供服务。对于这样的应用程序,缺乏可用性、可靠性或响应性可能导致大量的损失。例如,2010年6月29日,亚马逊网站经历了三个小时的间歇性性能问题,因为这个通常可靠的网站需要几分钟才能加载商品,而且搜索回来时没有产品链接。客户也无法下订单。根据他们2010年的季度收入,这样的停机时间每小时可能会给亚马逊造成175万美元的损失,因此快速解决问题对其业务至关重要。在另一起严重事件中,2010年7月7日,新加坡星展银行(DBS bank)遭遇了7小时的停机,导致其网上银行系统瘫痪,并中断了其他消费银行服务,包括自动柜员机、信用卡和网络支付。这次级联故障是由于在更换连接到银行主计算机的存储系统中的一个故障组件时出现程序错误造成的。大规模分布式系统中的高停机成本促使人们需要能够快速检测问题并为系统管理员提供潜在解决方案的故障排除工具。企业应用程序的规模和复杂性不断增加,再加上它们运行的数据中心规模庞大,这使得故障排除极具挑战性。由于硬件和软件系统之间复杂的相互作用,各种各样的根本原因都可能导致问题的出现。这些系统中可用的大量监测数据可能掩盖了这些问题的根本原因。最后,应用程序的多层性质是由完全不同的子系统组成的
{"title":"Performance troubleshooting in data centers: an annotated bibliography?","authors":"Chengwei Wang, Soila Kavulya, Jiaqi Tan, Liting Hu, Mahendra Kutare, Michael P. Kasick, K. Schwan, P. Narasimhan, R. Gandhi","doi":"10.1145/2553070.2553079","DOIUrl":"https://doi.org/10.1145/2553070.2553079","url":null,"abstract":"In the emerging cloud computing era, enterprise data centers host a plethora of web services and applications, including those for e-Commerce, distributed multimedia, and social networks, which jointly, serve many aspects of our daily lives and business. For such applications, lack of availability, reliability, or responsiveness can lead to extensive losses. For instance, on June 29 2010, Amazon.com experienced three hours of intermittent performance problems as the normally reliable website took minutes to load items, and searches came back without product links. Customers were also unable to place orders. Based on their 2010 quarterly revenues, such downtime could cost Amazon up to $1.75 million per hour, thus making rapid problem resolution critical to its business. In another serious incident, on July 7, 2010, DBS bank in Singapore suffered a 7-hour outage which crippled its Internet banking systems, and disrupted other consumer banking services, including automated teller machines, credit card and NETS payments. The cascading failure occurred due to a procedural error while replacing a faulty component in one of the bank’s storage systems that was connected to its main computers. The high-cost of downtime in large-scale distributed systems drives the need for troubleshooting tools that can quickly detect problems and point system administrators to potential solutions. The increasing size and complexity of enterprise applications, coupled with the large scale of data centers in which they operate, make troubleshooting extremely challenging. Problems can arise due to a large variety of root-causes because of the complex interactions between hardware and software systems. The large volume of monitoring data available in these systems can obscure the root-cause of these problems. Lastly, the multi-tier nature of applications composed of entirely different subsystems man-","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"8 1","pages":"50-62"},"PeriodicalIF":0.0,"publicationDate":"2013-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87690457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Your server as a function 您的服务器作为一个函数
Pub Date : 2013-11-03 DOI: 10.1145/2525528.2525538
Marius Eriksen
Building server software in a large-scale setting, where systems exhibit a high degree of concurrency and environmental variability, is a challenging task to even the most experienced programmer. Efficiency, safety, and robustness are paramount---goals which have traditionally conflicted with modularity, reusability, and flexibility. We describe three abstractions which combine to present a powerful programming model for building safe, modular, and efficient server software: Composable futures are used to relate concurrent, asynchronous actions; services and filters are specialized functions used for the modular composition of our complex server software. Finally, we discuss our experiences using these abstractions and techniques throughout Twitter's serving infrastructure.
在大规模设置中构建服务器软件,其中系统表现出高度的并发性和环境可变性,这对即使是最有经验的程序员也是一项具有挑战性的任务。效率、安全性和健壮性是最重要的——这些目标传统上与模块化、可重用性和灵活性相冲突。我们描述了三种抽象,它们结合起来提供了一个强大的编程模型,用于构建安全、模块化和高效的服务器软件:可组合的未来用于关联并发、异步操作;服务和过滤器是用于我们复杂服务器软件的模块化组成的专用功能。最后,我们将讨论在Twitter的服务基础设施中使用这些抽象和技术的经验。
{"title":"Your server as a function","authors":"Marius Eriksen","doi":"10.1145/2525528.2525538","DOIUrl":"https://doi.org/10.1145/2525528.2525538","url":null,"abstract":"Building server software in a large-scale setting, where systems exhibit a high degree of concurrency and environmental variability, is a challenging task to even the most experienced programmer. Efficiency, safety, and robustness are paramount---goals which have traditionally conflicted with modularity, reusability, and flexibility.\u0000 We describe three abstractions which combine to present a powerful programming model for building safe, modular, and efficient server software: Composable futures are used to relate concurrent, asynchronous actions; services and filters are specialized functions used for the modular composition of our complex server software.\u0000 Finally, we discuss our experiences using these abstractions and techniques throughout Twitter's serving infrastructure.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"15 1","pages":"51-57"},"PeriodicalIF":0.0,"publicationDate":"2013-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82641877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Verifying cloud services: present and future 验证云服务:现在和未来
Pub Date : 2013-07-23 DOI: 10.1145/2506164.2506167
S. Bouchenak, G. Chockler, Hana Chockler, Gabriela Gheorghe, Nuno Santos, A. Shraer
As cloud-based services gain popularity in both private and enterprise domains, cloud consumers are still lacking in tools to verify that these services work as expected. Such tools should consider properties such as functional correctness, service availability, reliability, performance and security guarantees. In this paper we survey existing work in these areas and identify gaps in existing cloud technology in terms of the verification tools provided to users. We also discuss challenges and new research directions that can help bridge these gaps.
随着基于云的服务在私有和企业领域越来越受欢迎,云消费者仍然缺乏工具来验证这些服务是否按预期工作。这些工具应该考虑功能正确性、服务可用性、可靠性、性能和安全保证等属性。在本文中,我们调查了这些领域的现有工作,并确定了现有云技术在向用户提供验证工具方面的差距。我们还讨论了可以帮助弥合这些差距的挑战和新的研究方向。
{"title":"Verifying cloud services: present and future","authors":"S. Bouchenak, G. Chockler, Hana Chockler, Gabriela Gheorghe, Nuno Santos, A. Shraer","doi":"10.1145/2506164.2506167","DOIUrl":"https://doi.org/10.1145/2506164.2506167","url":null,"abstract":"As cloud-based services gain popularity in both private and enterprise domains, cloud consumers are still lacking in tools to verify that these services work as expected. Such tools should consider properties such as functional correctness, service availability, reliability, performance and security guarantees. In this paper we survey existing work in these areas and identify gaps in existing cloud technology in terms of the verification tools provided to users. We also discuss challenges and new research directions that can help bridge these gaps.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":"45 1","pages":"6-19"},"PeriodicalIF":0.0,"publicationDate":"2013-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72664617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
期刊
ACM SIGOPS Oper. Syst. Rev.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1