基于两阶段自愈温度预测模型的数据中心温度预测与管理

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Simulation Modelling Practice and Theory Pub Date : 2023-12-28 DOI:10.1016/j.simpat.2023.102883

Wang Simin , Kang Yifei , Xu Yixuan , Ma Chunmiao , Wang Haitao , Wu Weiguo

{"title":"基于两阶段自愈温度预测模型的数据中心温度预测与管理","authors":"Wang Simin , Kang Yifei , Xu Yixuan , Ma Chunmiao , Wang Haitao , Wu Weiguo","doi":"10.1016/j.simpat.2023.102883","DOIUrl":null,"url":null,"abstract":"<div>While providing efficient and convenient cloud services, data center also brings great pressure to energy consumption and environment. The rise of server temperature not only increases the refrigeration cost, but also seriously affects the operation safety of the data center. Effective analysis and prediction of data center temperature is not only conducive to preventing server overheating and shutdown, but also crucial to data center task scheduling, resource allocation optimization and energy efficiency improvement of data center. Therefore, this article proposes a Two-stage Gated Recurrent Unit (GRU) temperature prediction algorithm with self-healing mechanism. The algorithm establishes a prediction model for the important parameters affecting temperature prediction - CPU utilization, and takes the output of the model as the input parameter of the server temperature prediction model, which fits the changes of each parameter more accurate. To avoid the decrease in prediction accuracy caused by new operating conditions that have not been learned before and changes in physical environmental factors during the operation of the model, a self-healing mechanism is proposed to ensure the prediction accuracy of the model. Experiments show that our prediction model can accurately predict the inlet temperature evolution of the server with dynamic workload. It reduces the prediction error (RSME) to 0.280, and the average prediction temperature difference is only 0.675, which is 10 % higher than the single stage prediction accuracy. The use of Two-stage prediction methods in other machine learning methods can also improve prediction accuracy. Based on the prediction model, this paper proposes a task scheduling algorithm that minimizes temperature difference. The algorithm can make the temperature between servers more balanced after task allocation, effectively reducing the number of servers running at high and low temperatures in the data center, avoiding refrigeration waste, and achieving energy conservation in the data center.</div>","PeriodicalId":49518,"journal":{"name":"Simulation Modelling Practice and Theory","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data center temperature prediction and management based on a Two-stage self-healing model\",\"authors\":\"Wang Simin , Kang Yifei , Xu Yixuan , Ma Chunmiao , Wang Haitao , Wu Weiguo\",\"doi\":\"10.1016/j.simpat.2023.102883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>While providing efficient and convenient cloud services, data center also brings great pressure to energy consumption and environment. The rise of server temperature not only increases the refrigeration cost, but also seriously affects the operation safety of the data center. Effective analysis and prediction of data center temperature is not only conducive to preventing server overheating and shutdown, but also crucial to data center task scheduling, resource allocation optimization and energy efficiency improvement of data center. Therefore, this article proposes a Two-stage Gated Recurrent Unit (GRU) temperature prediction algorithm with self-healing mechanism. The algorithm establishes a prediction model for the important parameters affecting temperature prediction - CPU utilization, and takes the output of the model as the input parameter of the server temperature prediction model, which fits the changes of each parameter more accurate. To avoid the decrease in prediction accuracy caused by new operating conditions that have not been learned before and changes in physical environmental factors during the operation of the model, a self-healing mechanism is proposed to ensure the prediction accuracy of the model. Experiments show that our prediction model can accurately predict the inlet temperature evolution of the server with dynamic workload. It reduces the prediction error (RSME) to 0.280, and the average prediction temperature difference is only 0.675, which is 10 % higher than the single stage prediction accuracy. The use of Two-stage prediction methods in other machine learning methods can also improve prediction accuracy. Based on the prediction model, this paper proposes a task scheduling algorithm that minimizes temperature difference. The algorithm can make the temperature between servers more balanced after task allocation, effectively reducing the number of servers running at high and low temperatures in the data center, avoiding refrigeration waste, and achieving energy conservation in the data center.</div>\",\"PeriodicalId\":49518,\"journal\":{\"name\":\"Simulation Modelling Practice and Theory\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2023-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Simulation Modelling Practice and Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569190X23001600\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Simulation Modelling Practice and Theory","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569190X23001600","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

数据中心在提供高效便捷的云服务的同时，也给能耗和环境带来了巨大压力。服务器温度的升高不仅会增加制冷成本，还会严重影响数据中心的运行安全。对数据中心温度的有效分析和预测不仅有利于防止服务器过热而停机，而且对数据中心的任务调度、资源配置优化和能效提升也至关重要。因此，本文提出了一种具有自愈机制的两阶段门控循环单元（GRU）温度预测算法。该算法针对影响温度预测的重要参数--CPU 利用率建立了预测模型，并将模型的输出作为服务器温度预测模型的输入参数，能更准确地拟合各参数的变化。为了避免在模型运行过程中，因未学习过的新运行条件和物理环境因素的变化而导致预测精度下降，我们提出了一种自愈机制，以确保模型的预测精度。实验表明，我们的预测模型可以准确预测服务器在动态工作负载下的入口温度变化。它将预测误差（RSME）降低到 0.280，平均预测温差仅为 0.675，比单级预测精度高 10%。在其他机器学习方法中使用两阶段预测方法也能提高预测精度。基于预测模型，本文提出了一种最小化温差的任务调度算法。该算法可以使任务分配后服务器之间的温度更加均衡，有效减少数据中心内高温和低温运行的服务器数量，避免制冷浪费，实现数据中心的节能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data center temperature prediction and management based on a Two-stage self-healing model

While providing efficient and convenient cloud services, data center also brings great pressure to energy consumption and environment. The rise of server temperature not only increases the refrigeration cost, but also seriously affects the operation safety of the data center. Effective analysis and prediction of data center temperature is not only conducive to preventing server overheating and shutdown, but also crucial to data center task scheduling, resource allocation optimization and energy efficiency improvement of data center. Therefore, this article proposes a Two-stage Gated Recurrent Unit (GRU) temperature prediction algorithm with self-healing mechanism. The algorithm establishes a prediction model for the important parameters affecting temperature prediction - CPU utilization, and takes the output of the model as the input parameter of the server temperature prediction model, which fits the changes of each parameter more accurate. To avoid the decrease in prediction accuracy caused by new operating conditions that have not been learned before and changes in physical environmental factors during the operation of the model, a self-healing mechanism is proposed to ensure the prediction accuracy of the model. Experiments show that our prediction model can accurately predict the inlet temperature evolution of the server with dynamic workload. It reduces the prediction error (RSME) to 0.280, and the average prediction temperature difference is only 0.675, which is 10 % higher than the single stage prediction accuracy. The use of Two-stage prediction methods in other machine learning methods can also improve prediction accuracy. Based on the prediction model, this paper proposes a task scheduling algorithm that minimizes temperature difference. The algorithm can make the temperature between servers more balanced after task allocation, effectively reducing the number of servers running at high and low temperatures in the data center, avoiding refrigeration waste, and achieving energy conservation in the data center.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Simulation Modelling Practice and Theory 工程技术-计算机：跨学科应用

CiteScore

9.80

自引率

4.80%

发文量

142

审稿时长

21 days

期刊介绍： The journal Simulation Modelling Practice and Theory provides a forum for original, high-quality papers dealing with any aspect of systems simulation and modelling. The journal aims at being a reference and a powerful tool to all those professionally active and/or interested in the methods and applications of simulation. Submitted papers will be peer reviewed and must significantly contribute to modelling and simulation in general or use modelling and simulation in application areas. Paper submission is solicited on: • theoretical aspects of modelling and simulation including formal modelling, model-checking, random number generators, sensitivity analysis, variance reduction techniques, experimental design, meta-modelling, methods and algorithms for validation and verification, selection and comparison procedures etc.; • methodology and application of modelling and simulation in any area, including computer systems, networks, real-time and embedded systems, mobile and intelligent agents, manufacturing and transportation systems, management, engineering, biomedical engineering, economics, ecology and environment, education, transaction handling, etc.; • simulation languages and environments including those, specific to distributed computing, grid computing, high performance computers or computer networks, etc.; • distributed and real-time simulation, simulation interoperability; • tools for high performance computing simulation, including dedicated architectures and parallel computing.