数据中心的可用性随需应变机制

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2015-05-04 DOI:10.1109/CCGrid.2015.58

S. Shen, A. Iosup, A. Israel, W. Cirne, D. Raz, D. Epema

{"title":"数据中心的可用性随需应变机制","authors":"S. Shen, A. Iosup, A. Israel, W. Cirne, D. Raz, D. Epema","doi":"10.1109/CCGrid.2015.58","DOIUrl":null,"url":null,"abstract":"Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"84 1","pages":"495-504"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Availability-on-Demand Mechanism for Datacenters\",\"authors\":\"S. Shen, A. Iosup, A. Israel, W. Cirne, D. Raz, D. Epema\",\"doi\":\"10.1109/CCGrid.2015.58\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.\",\"PeriodicalId\":6664,\"journal\":{\"name\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"84 1\",\"pages\":\"495-504\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2015.58\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

从科学计算到在线游戏，数据输入是各种日常信息通信技术实用程序的核心。由于当今数据输入的规模，计算资源的故障是一种常见现象，可能会破坏信息通信技术服务的可用性，导致收入损失。尽管已经提出了许多高可用性(HA)技术来掩盖资源故障，但数据中心用户(租用数据中心资源并使用它们向全球人口提供ICT公用设施)在动态选择和配置HA技术方面的管理选项仍然有限。在这项工作中，我们提出了按需可用性(AoD)，这是一种由API组成的机制，允许数据中心用户指定可以动态更改的可用性需求，以及基于用户指定需求动态管理计算资源的可用性感知调度程序。该机制在单个服务实例级别上运行，从而支持对可用性的细粒度控制，例如在突然的需求更改和周期性操作期间。通过真实的、基于轨迹的仿真，我们证明了AoD机制能够以低成本实现高可用性。与随机使用HA技术的方法相比，AoD方法消耗的CPU时间大致相同，但具有更高的可用性。此外，与对故障有完美预测的理想方法相比，它消耗的CPU时间多13%到31%，但在应用程序的关键部分实现了类似的可用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Availability-on-Demand Mechanism for Datacenters

Data enters are at the core of a wide variety of daily ICT utilities, ranging from scientific computing to online gaming. Due to the scale of today's data enters, the failure of computing resources is a common occurrence that may disrupt the availability of ICT services, leading to revenue loss. Although many high availability (HA) techniques have been proposed to mask resource failures, datacenter users' -- who rent datacenter resources and use them to provide ICT utilities to a global population' -- still have limited management options for dynamically selecting and configuring HA techniques. In this work, we propose Availability-on-Demand (AoD), a mechanism consisting of an API that allows datacenter users to specify availability requirements which can dynamically change, and an availability-aware scheduler that dynamically manages computing resources based on user-specified requirements. The mechanism operates at the level of individual service instance, thus enabling fine-grained control of availability, for example during sudden requirement changes and periodic operations. Through realistic, trace-based simulations, we show that the AoD mechanism can achieve high availability with low cost. The AoD approach consumes about the same CPU hours but with higher availability than approaches which use HA techniques randomly. Moreover, comparing to an ideal approach which has perfect predictions about failures, it consumes 13% to 31% more CPU hours but achieves similar availability for critical parts of applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量