Kazi Main Uddin Ahmed;Math H. J. Bollen;Manuel Alvarez
{"title":"确定数据中心可靠节能运行的最佳服务器数量的随机方法","authors":"Kazi Main Uddin Ahmed;Math H. J. Bollen;Manuel Alvarez","doi":"10.1109/TSUSC.2022.3216350","DOIUrl":null,"url":null,"abstract":"The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers’ failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data Centers\",\"authors\":\"Kazi Main Uddin Ahmed;Math H. J. Bollen;Manuel Alvarez\",\"doi\":\"10.1109/TSUSC.2022.3216350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers’ failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9926075/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9926075/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Stochastic Approach to Determine the Optimal Number of Servers for Reliable and Energy Efficient Operation of Data Centers
The increasing demand of the data center's computational capacity in recent years has introduced new data center operational challenges among others to maintain the service level agreements (SLA) and quality of services (QoS), while at the same time limiting energy consumption. In this paper, a stochastic operational risk assessment approach is presented that estimates the required number of spare servers in a data center considering the risk of servers’ failure in operation since servers define the computational capability of a data center. A reliability index called “risk of computational resource commitment (RCRC)” is introduced that quantifies the probability of having insufficient spare servers due to failures during the operational lead time, and the complement of the RCRC shows the ability of the resources to maintain SLA of a data center. The failure rates of the servers are obtained using a Monte Carlo Simulation with the failure data, published by Google in 2019. The analysis shows that the RCRC reduces with the increasing number of spare servers, while it also stresses the energy efficiency of the data center. The RCRC index could be used in data center operation to avoid overprovisioning of the servers and to limit the number of spare servers in the data center, while creating a suitable balance between QoS and energy consumption of the data centers.