Wenjun Lin , Weiwei Lin , Jianpeng Lin , Haocheng Zhong , Jiangtao Wang , Ligang He
{"title":"A multi-agent reinforcement learning-based method for server energy efficiency optimization combining DVFS and dynamic fan control","authors":"Wenjun Lin , Weiwei Lin , Jianpeng Lin , Haocheng Zhong , Jiangtao Wang , Ligang He","doi":"10.1016/j.suscom.2024.100977","DOIUrl":null,"url":null,"abstract":"<div><p>With the rapid development of the digital economy and intelligent industry, the energy consumption of data centers (DCs) has increased significantly. Various optimization methods are proposed to improve the energy efficiency of servers in DCs. However, existing solutions usually adopt model-based heuristics and best practices to select operations, which are not universally applicable. Moreover, existing works primarily focus on the optimization methods for individual components, with a lack of work on the joint optimization of multiple components. Therefore, we propose a multi-agent reinforcement learning-based method, named MRDF, combining DVFS and dynamic fan control to achieve a trade-off between power consumption and performance while satisfying thermal constraints. MRDF is model-free and learns by continuously interacting with the real server without prior knowledge. To enhance the stability of MRDF in dynamic environments, we design a data-driven baseline comparison method to evaluate the actual contribution of a single agent to the global reward. In addition, an improved Q-learning is proposed to deal with the large state and action space of the multi-core server. We implement MRDF on a Huawei Taishan 200 server and verify the effectiveness by running benchmarks. Experimental results show that the proposed method improves energy efficiency by an average of 3.9% compared to the best baseline solution, while flexibly adapting to different thermal constraints.</p></div>","PeriodicalId":48686,"journal":{"name":"Sustainable Computing-Informatics & Systems","volume":"42 ","pages":"Article 100977"},"PeriodicalIF":3.8000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sustainable Computing-Informatics & Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210537924000222","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of the digital economy and intelligent industry, the energy consumption of data centers (DCs) has increased significantly. Various optimization methods are proposed to improve the energy efficiency of servers in DCs. However, existing solutions usually adopt model-based heuristics and best practices to select operations, which are not universally applicable. Moreover, existing works primarily focus on the optimization methods for individual components, with a lack of work on the joint optimization of multiple components. Therefore, we propose a multi-agent reinforcement learning-based method, named MRDF, combining DVFS and dynamic fan control to achieve a trade-off between power consumption and performance while satisfying thermal constraints. MRDF is model-free and learns by continuously interacting with the real server without prior knowledge. To enhance the stability of MRDF in dynamic environments, we design a data-driven baseline comparison method to evaluate the actual contribution of a single agent to the global reward. In addition, an improved Q-learning is proposed to deal with the large state and action space of the multi-core server. We implement MRDF on a Huawei Taishan 200 server and verify the effectiveness by running benchmarks. Experimental results show that the proposed method improves energy efficiency by an average of 3.9% compared to the best baseline solution, while flexibly adapting to different thermal constraints.
期刊介绍:
Sustainable computing is a rapidly expanding research area spanning the fields of computer science and engineering, electrical engineering as well as other engineering disciplines. The aim of Sustainable Computing: Informatics and Systems (SUSCOM) is to publish the myriad research findings related to energy-aware and thermal-aware management of computing resource. Equally important is a spectrum of related research issues such as applications of computing that can have ecological and societal impacts. SUSCOM publishes original and timely research papers and survey articles in current areas of power, energy, temperature, and environment related research areas of current importance to readers. SUSCOM has an editorial board comprising prominent researchers from around the world and selects competitively evaluated peer-reviewed papers.