基于元学习和模型的离线强化学习的暖通空调系统快速人在环控制

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Sustainable Computing Pub Date : 2023-03-01 DOI:10.1109/TSUSC.2023.3251302

Liangliang Chen;Fei Meng;Ying Zhang

{"title":"基于元学习和模型的离线强化学习的暖通空调系统快速人在环控制","authors":"Liangliang Chen;Fei Meng;Ying Zhang","doi":"10.1109/TSUSC.2023.3251302","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) methods can be used to develop a controller for the heating, ventilation, and air conditioning (HVAC) systems that both saves energy and ensures high occupants’ thermal comfort levels. However, the existing works typically require on-policy data to train an RL agent, and the occupants’ personalized thermal preferences are not considered, which is limited in the real-world scenarios. This paper designs a high-performance model-based offline RL algorithm for personalized HVAC systems. The proposed algorithm can quickly adapt to different occupants’ thermal preferences with a few thermal feedbacks, guaranteeing the high occupants’ personalized thermal comfort levels efficiently. First, we use a meta-supervised learning algorithm to train an occupant's thermal preference model. Then, we train an ensemble neural network to predict the thermal states of the considered zone. In addition, the obtained ensemble networks can indicate the regions in the state and action spaces covered by the offline dataset. With the personalized thermal preference model updated via meta-testing, model-based RL is used to derive the optimal HVAC controller. Since the proposed algorithm only requires offline datasets and a few online thermal feedbacks for training, it contributes to a more practical deployment of the RL algorithm to HVAC systems. We use the ASHRAE database II to verify the effectiveness and advantage of the meta-learning algorithm for modeling different occupants’ thermal preferences. Numerical simulations on the EnergyPlus environment demonstrate that the proposed algorithm can guarantee personalized thermal preferences with a slight increase of power consumption of 1.91% compared with the model-based RL algorithm with on-policy data aggregation.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"8 3","pages":"504-521"},"PeriodicalIF":3.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Fast Human-in-the-Loop Control for HVAC Systems via Meta-Learning and Model-Based Offline Reinforcement Learning\",\"authors\":\"Liangliang Chen;Fei Meng;Ying Zhang\",\"doi\":\"10.1109/TSUSC.2023.3251302\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) methods can be used to develop a controller for the heating, ventilation, and air conditioning (HVAC) systems that both saves energy and ensures high occupants’ thermal comfort levels. However, the existing works typically require on-policy data to train an RL agent, and the occupants’ personalized thermal preferences are not considered, which is limited in the real-world scenarios. This paper designs a high-performance model-based offline RL algorithm for personalized HVAC systems. The proposed algorithm can quickly adapt to different occupants’ thermal preferences with a few thermal feedbacks, guaranteeing the high occupants’ personalized thermal comfort levels efficiently. First, we use a meta-supervised learning algorithm to train an occupant's thermal preference model. Then, we train an ensemble neural network to predict the thermal states of the considered zone. In addition, the obtained ensemble networks can indicate the regions in the state and action spaces covered by the offline dataset. With the personalized thermal preference model updated via meta-testing, model-based RL is used to derive the optimal HVAC controller. Since the proposed algorithm only requires offline datasets and a few online thermal feedbacks for training, it contributes to a more practical deployment of the RL algorithm to HVAC systems. We use the ASHRAE database II to verify the effectiveness and advantage of the meta-learning algorithm for modeling different occupants’ thermal preferences. Numerical simulations on the EnergyPlus environment demonstrate that the proposed algorithm can guarantee personalized thermal preferences with a slight increase of power consumption of 1.91% compared with the model-based RL algorithm with on-policy data aggregation.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"8 3\",\"pages\":\"504-521\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10057050/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10057050/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 4

摘要

强化学习（RL）方法可用于开发供暖、通风和空调（HVAC）系统的控制器，该控制器既能节省能源，又能确保乘客的热舒适度。然而，现有的工作通常需要策略数据来训练RL代理，并且没有考虑居住者的个性化热偏好，这在现实世界的场景中是有限的。本文设计了一种用于个性化暖通空调系统的基于模型的高性能离线RL算法。该算法可以通过少量的热反馈快速适应不同乘客的热偏好，有效地保证了高乘客的个性化热舒适度。首先，我们使用元监督学习算法来训练乘客的热偏好模型。然后，我们训练一个集成神经网络来预测所考虑区域的热状态。此外，所获得的集合网络可以指示离线数据集所覆盖的状态和动作空间中的区域。通过元测试更新个性化热偏好模型，使用基于模型的RL来推导最优HVAC控制器。由于所提出的算法只需要离线数据集和一些在线热反馈进行训练，因此它有助于将RL算法更实际地部署到暖通空调系统中。我们使用ASHRAE数据库II来验证元学习算法对不同居住者的热偏好建模的有效性和优势。EnergyPlus环境下的数值模拟表明，与基于策略的数据聚合的基于模型的RL算法相比，所提出的算法可以保证个性化的热偏好，功耗略微增加1.91%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast Human-in-the-Loop Control for HVAC Systems via Meta-Learning and Model-Based Offline Reinforcement Learning

Reinforcement learning (RL) methods can be used to develop a controller for the heating, ventilation, and air conditioning (HVAC) systems that both saves energy and ensures high occupants’ thermal comfort levels. However, the existing works typically require on-policy data to train an RL agent, and the occupants’ personalized thermal preferences are not considered, which is limited in the real-world scenarios. This paper designs a high-performance model-based offline RL algorithm for personalized HVAC systems. The proposed algorithm can quickly adapt to different occupants’ thermal preferences with a few thermal feedbacks, guaranteeing the high occupants’ personalized thermal comfort levels efficiently. First, we use a meta-supervised learning algorithm to train an occupant's thermal preference model. Then, we train an ensemble neural network to predict the thermal states of the considered zone. In addition, the obtained ensemble networks can indicate the regions in the state and action spaces covered by the offline dataset. With the personalized thermal preference model updated via meta-testing, model-based RL is used to derive the optimal HVAC controller. Since the proposed algorithm only requires offline datasets and a few online thermal feedbacks for training, it contributes to a more practical deployment of the RL algorithm to HVAC systems. We use the ASHRAE database II to verify the effectiveness and advantage of the meta-learning algorithm for modeling different occupants’ thermal preferences. Numerical simulations on the EnergyPlus environment demonstrate that the proposed algorithm can guarantee personalized thermal preferences with a slight increase of power consumption of 1.91% compared with the model-based RL algorithm with on-policy data aggregation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization

CiteScore

7.70

自引率

2.60%

发文量