Context-aware reinforcement learning for cooling operation of data centers with an Aquifer Thermal Energy Storage

IF 9.6 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Energy and AI Pub Date : 2024-07-05 DOI:10.1016/j.egyai.2024.100395

Lukas Leindals, Peter Grønning, Dominik Franjo Dominković, Rune Grønborg Junker

{"title":"Context-aware reinforcement learning for cooling operation of data centers with an Aquifer Thermal Energy Storage","authors":"Lukas Leindals, Peter Grønning, Dominik Franjo Dominković, Rune Grønborg Junker","doi":"10.1016/j.egyai.2024.100395","DOIUrl":null,"url":null,"abstract":"<div><p>Data centers are often equipped with multiple cooling units. Here, an aquifer thermal energy storage (ATES) system has shown to be efficient. However, the usage of hot and cold-water wells in the ATES must be balanced for legal and environmental reasons. Reinforcement Learning has been proven to be a useful tool for optimizing the cooling operation at data centers. Nonetheless, since cooling demand changes continuously, balancing the ATES usage on a yearly basis imposes an additional challenge in the form of a delayed reward. To overcome this, we formulate a return decomposition, Cool-RUDDER, which relies on simple domain knowledge and needs no training. We trained a proximal policy optimization agent to keep server temperatures steady while minimizing operational costs. Comparing the Cool-RUDDER reward signal to other ATES-associated rewards, all models kept the server temperatures steady at around 30 °C. An optimal ATES balance was defined to be 0% and a yearly imbalance of −4.9% with a confidence interval of [−6.2, −3.8]% was achieved for the Cool 2.0 reward. This outperformed a baseline ATES-associated reward of 0 at −16.3% with a confidence interval of [−17.1, −15.4]% and all other ATES-associated rewards. However, the improved ATES balance comes with a higher energy consumption cost of 12.5% when comparing the relative cost of the Cool 2.0 reward to the zero reward, resulting in a trade-off. Moreover, the method comes with limited requirements and is applicable to any long-term problem satisfying a linear state-transition system.</p></div>","PeriodicalId":34138,"journal":{"name":"Energy and AI","volume":"17 ","pages":"Article 100395"},"PeriodicalIF":9.6000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666546824000612/pdfft?md5=b17bfa78652179749ed19203f3f51d82&pid=1-s2.0-S2666546824000612-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and AI","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666546824000612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Data centers are often equipped with multiple cooling units. Here, an aquifer thermal energy storage (ATES) system has shown to be efficient. However, the usage of hot and cold-water wells in the ATES must be balanced for legal and environmental reasons. Reinforcement Learning has been proven to be a useful tool for optimizing the cooling operation at data centers. Nonetheless, since cooling demand changes continuously, balancing the ATES usage on a yearly basis imposes an additional challenge in the form of a delayed reward. To overcome this, we formulate a return decomposition, Cool-RUDDER, which relies on simple domain knowledge and needs no training. We trained a proximal policy optimization agent to keep server temperatures steady while minimizing operational costs. Comparing the Cool-RUDDER reward signal to other ATES-associated rewards, all models kept the server temperatures steady at around 30 °C. An optimal ATES balance was defined to be 0% and a yearly imbalance of −4.9% with a confidence interval of [−6.2, −3.8]% was achieved for the Cool 2.0 reward. This outperformed a baseline ATES-associated reward of 0 at −16.3% with a confidence interval of [−17.1, −15.4]% and all other ATES-associated rewards. However, the improved ATES balance comes with a higher energy consumption cost of 12.5% when comparing the relative cost of the Cool 2.0 reward to the zero reward, resulting in a trade-off. Moreover, the method comes with limited requirements and is applicable to any long-term problem satisfying a linear state-transition system.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用含水层热能存储的数据中心冷却运行的情境感知强化学习

数据中心通常配备多个冷却装置。在这种情况下，含水层热能储存（ATES）系统就显示出了高效性。然而，出于法律和环境原因，ATES 系统中冷热水井的使用必须保持平衡。强化学习已被证明是优化数据中心冷却运行的有效工具。然而，由于冷却需求不断变化，每年平衡 ATES 的使用会带来额外的挑战，即延迟奖励。为了克服这一问题，我们提出了一种回报分解方法 Cool-RUDDER，它依赖于简单的领域知识，无需培训。我们训练了一个近似策略优化代理，以保持服务器温度稳定，同时最大限度地降低运营成本。将 Cool-RUDDER 奖励信号与其他 ATES 相关奖励进行比较，所有模型都能将服务器温度稳定在 30 °C 左右。最佳 ATES 平衡被定义为 0%，而 Cool 2.0 奖励的年失衡率为 -4.9%，置信区间为 [-6.2, -3.8]%。这一结果优于 ATES 相关奖励基准值 0（-16.3%，置信区间为[-17.1, -15.4]%）和所有其他 ATES 相关奖励。不过，在比较 Cool 2.0 奖励与 0 奖励的相对成本时，改进后的 ATES 平衡会带来 12.5% 的较高能耗成本，因此需要权衡利弊。此外，该方法要求有限，适用于任何满足线性状态转换系统的长期问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊