{"title":"用于大规模复杂供热系统自主控制的可信强化学习框架:模拟和实地实施","authors":"Amirreza Heidari , Luc Girardin , Cédric Dorsaz , François Maréchal","doi":"10.1016/j.apenergy.2024.124815","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional control approaches heavily rely on hard-coded expert knowledge, complicating the development of optimal control solutions as system complexity increases. Deep Reinforcement Learning (DRL) offers a self-learning control solution, proving advantageous in scenarios where crafting expert-based solutions becomes intricate. This study investigates the potential of DRL for supervisory control in a unique and complex heating system within a large-scale university building. The DRL framework aims to minimize energy costs while ensuring occupant comfort. However, the trial-and-error learning approach of DRL raises concerns about the trustworthiness of executed actions, hindering practical implementation. To address this, the study incorporates action masking, enabling the integration of hard constraints into DRL to enhance user trust. Maskable Proximal Policy Optimization (MPPO) is evaluated alongside standard Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC). Simulation results reveal that MPPO achieves comparable energy savings (8% relative to the baseline control) with fewer comfort violations than other methods. Therefore, it is selected among the candidate algorithms and experimentally implemented in the university building over one week. Experimental findings demonstrate that MPPO reduces energy costs while maintaining occupant comfort, resulting in a 36% saving compared to a historical day with similar weather conditions. These results underscore the proactive decision-making capability of DRL, establishing its viability for autonomous control in complex energy systems.</div></div>","PeriodicalId":246,"journal":{"name":"Applied Energy","volume":"378 ","pages":"Article 124815"},"PeriodicalIF":10.1000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A trustworthy reinforcement learning framework for autonomous control of a large-scale complex heating system: Simulation and field implementation\",\"authors\":\"Amirreza Heidari , Luc Girardin , Cédric Dorsaz , François Maréchal\",\"doi\":\"10.1016/j.apenergy.2024.124815\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional control approaches heavily rely on hard-coded expert knowledge, complicating the development of optimal control solutions as system complexity increases. Deep Reinforcement Learning (DRL) offers a self-learning control solution, proving advantageous in scenarios where crafting expert-based solutions becomes intricate. This study investigates the potential of DRL for supervisory control in a unique and complex heating system within a large-scale university building. The DRL framework aims to minimize energy costs while ensuring occupant comfort. However, the trial-and-error learning approach of DRL raises concerns about the trustworthiness of executed actions, hindering practical implementation. To address this, the study incorporates action masking, enabling the integration of hard constraints into DRL to enhance user trust. Maskable Proximal Policy Optimization (MPPO) is evaluated alongside standard Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC). Simulation results reveal that MPPO achieves comparable energy savings (8% relative to the baseline control) with fewer comfort violations than other methods. Therefore, it is selected among the candidate algorithms and experimentally implemented in the university building over one week. Experimental findings demonstrate that MPPO reduces energy costs while maintaining occupant comfort, resulting in a 36% saving compared to a historical day with similar weather conditions. These results underscore the proactive decision-making capability of DRL, establishing its viability for autonomous control in complex energy systems.</div></div>\",\"PeriodicalId\":246,\"journal\":{\"name\":\"Applied Energy\",\"volume\":\"378 \",\"pages\":\"Article 124815\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2024-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Energy\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306261924021986\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306261924021986","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
A trustworthy reinforcement learning framework for autonomous control of a large-scale complex heating system: Simulation and field implementation
Traditional control approaches heavily rely on hard-coded expert knowledge, complicating the development of optimal control solutions as system complexity increases. Deep Reinforcement Learning (DRL) offers a self-learning control solution, proving advantageous in scenarios where crafting expert-based solutions becomes intricate. This study investigates the potential of DRL for supervisory control in a unique and complex heating system within a large-scale university building. The DRL framework aims to minimize energy costs while ensuring occupant comfort. However, the trial-and-error learning approach of DRL raises concerns about the trustworthiness of executed actions, hindering practical implementation. To address this, the study incorporates action masking, enabling the integration of hard constraints into DRL to enhance user trust. Maskable Proximal Policy Optimization (MPPO) is evaluated alongside standard Proximal Policy Optimization (PPO) and Soft Actor–Critic (SAC). Simulation results reveal that MPPO achieves comparable energy savings (8% relative to the baseline control) with fewer comfort violations than other methods. Therefore, it is selected among the candidate algorithms and experimentally implemented in the university building over one week. Experimental findings demonstrate that MPPO reduces energy costs while maintaining occupant comfort, resulting in a 36% saving compared to a historical day with similar weather conditions. These results underscore the proactive decision-making capability of DRL, establishing its viability for autonomous control in complex energy systems.
期刊介绍:
Applied Energy serves as a platform for sharing innovations, research, development, and demonstrations in energy conversion, conservation, and sustainable energy systems. The journal covers topics such as optimal energy resource use, environmental pollutant mitigation, and energy process analysis. It welcomes original papers, review articles, technical notes, and letters to the editor. Authors are encouraged to submit manuscripts that bridge the gap between research, development, and implementation. The journal addresses a wide spectrum of topics, including fossil and renewable energy technologies, energy economics, and environmental impacts. Applied Energy also explores modeling and forecasting, conservation strategies, and the social and economic implications of energy policies, including climate change mitigation. It is complemented by the open-access journal Advances in Applied Energy.