确定数据中心可靠节能运行的最佳服务器数量的随机方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Sustainable Computing Pub Date : 2022-10-21 DOI:10.1109/TSUSC.2022.3216350

Kazi Main Uddin Ahmed;Math H. J. Bollen;Manuel Alvarez

{"title":"确定数据中心可靠节能运行的最佳服务器数量的随机方法","authors":"Kazi Main Uddin Ahmed;Math H. J. Bollen;Manuel Alvarez","doi":"10.1109/TSUSC.2022.3216350","DOIUrl":null,"url":null,"abstract":"The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers’ failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"8 2","pages":"153-164"},"PeriodicalIF":3.9000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data Centers\",\"authors\":\"Kazi Main Uddin Ahmed;Math H. J. Bollen;Manuel Alvarez\",\"doi\":\"10.1109/TSUSC.2022.3216350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers’ failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"8 2\",\"pages\":\"153-164\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9926075/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9926075/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，对数据中心计算能力的日益增长的需求带来了新的数据中心运营挑战，其中包括维护服务水平协议（SLA）和服务质量（QoS），同时限制能源消耗。在本文中，由于服务器定义了数据中心的计算能力，因此提出了一种随机操作风险评估方法，该方法在考虑服务器在操作中故障风险的情况下，估计数据中心所需的备用服务器数量。引入了一个名为“计算资源承诺风险（RCRC）”的可靠性指数，该指数量化了在运营交付周期内由于故障导致备用服务器不足的概率，RCRC的补充表明了资源维护数据中心SLA的能力。服务器的故障率是使用蒙特卡洛模拟和谷歌2019年发布的故障数据获得的。分析表明，RCRC随着备用服务器数量的增加而减少，同时也强调了数据中心的能效。RCRC索引可用于数据中心操作，以避免服务器的过度配置，并限制数据中心中备用服务器的数量，同时在数据中心的QoS和能耗之间建立适当的平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data Centers

The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers’ failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization

CiteScore

7.70

自引率

2.60%

发文量