自适应需求响应:在线学习躁动与控制的土匪

Qingsi Wang, M. Liu, J. Mathieu
{"title":"自适应需求响应:在线学习躁动与控制的土匪","authors":"Qingsi Wang, M. Liu, J. Mathieu","doi":"10.1109/SmartGridComm.2014.7007738","DOIUrl":null,"url":null,"abstract":"The capabilities of electric loads participating in load curtailment programs are often unknown until the loads have been told to curtail (i.e., deployed) and observed. In programs in which payments are made each time a load is deployed, we aim to pick the “best” loads to deploy in each time step. Our choice is a tradeoff between exploration and exploitation, i.e., curtailing poorly characterized loads in order to better characterize them in the hope of benefiting in the future versus curtailing well-characterized loads so that we benefit now. We formulate this problem as a multi-armed restless bandit problem with controlled bandits. In contrast to past work that has assumed all load parameters are known allowing the use of optimization approaches, we assume the parameters of the controlled system are unknown and develop an online learning approach. Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed by the decision-maker is the total realized curtailment, not the curtailment of each load. We develop an adaptive demand response learning algorithm and an extended version that works with aggregate feedback, both aimed at approximating the Whittle index policy. We show numerically that the regret of our algorithms with respect to the Whittle index policy is of logarithmic order in time, and significantly outperforms standard learning algorithms like UCB1.","PeriodicalId":6499,"journal":{"name":"2014 IEEE International Conference on Smart Grid Communications (SmartGridComm)","volume":"47 1","pages":"752-757"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Adaptive demand response: Online learning of restless and controlled bandits\",\"authors\":\"Qingsi Wang, M. Liu, J. Mathieu\",\"doi\":\"10.1109/SmartGridComm.2014.7007738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The capabilities of electric loads participating in load curtailment programs are often unknown until the loads have been told to curtail (i.e., deployed) and observed. In programs in which payments are made each time a load is deployed, we aim to pick the “best” loads to deploy in each time step. Our choice is a tradeoff between exploration and exploitation, i.e., curtailing poorly characterized loads in order to better characterize them in the hope of benefiting in the future versus curtailing well-characterized loads so that we benefit now. We formulate this problem as a multi-armed restless bandit problem with controlled bandits. In contrast to past work that has assumed all load parameters are known allowing the use of optimization approaches, we assume the parameters of the controlled system are unknown and develop an online learning approach. Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed by the decision-maker is the total realized curtailment, not the curtailment of each load. We develop an adaptive demand response learning algorithm and an extended version that works with aggregate feedback, both aimed at approximating the Whittle index policy. We show numerically that the regret of our algorithms with respect to the Whittle index policy is of logarithmic order in time, and significantly outperforms standard learning algorithms like UCB1.\",\"PeriodicalId\":6499,\"journal\":{\"name\":\"2014 IEEE International Conference on Smart Grid Communications (SmartGridComm)\",\"volume\":\"47 1\",\"pages\":\"752-757\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Smart Grid Communications (SmartGridComm)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SmartGridComm.2014.7007738\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Smart Grid Communications (SmartGridComm)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SmartGridComm.2014.7007738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

摘要

参与减载计划的电力负荷的能力通常是未知的,直到负荷被告知减载(即部署)和观察。在每次部署负载时都要付费的程序中,我们的目标是在每个时间步中选择“最佳”负载来部署。我们的选择是在探索和开发之间进行权衡,也就是说,减少特征不佳的负载以便更好地描述它们,以期在未来受益,而减少特征良好的负载以便我们现在受益。我们把这个问题表述为一个多武装的不安分的土匪问题。与过去假设所有负载参数都是已知的允许使用优化方法的工作相反,我们假设受控系统的参数是未知的,并开发了一种在线学习方法。我们的问题有两个特征在强盗文献中通常没有提到:武器/过程根据不同的概率规律根据控制而进化,决策者观察到的奖励/反馈是实现的总削减,而不是每个负载的削减。我们开发了一种自适应需求响应学习算法和一种扩展版本,用于汇总反馈,两者都旨在近似惠特尔指数策略。我们在数字上表明,我们的算法相对于Whittle索引策略的遗憾在时间上是对数阶的,并且显著优于UCB1等标准学习算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Adaptive demand response: Online learning of restless and controlled bandits
The capabilities of electric loads participating in load curtailment programs are often unknown until the loads have been told to curtail (i.e., deployed) and observed. In programs in which payments are made each time a load is deployed, we aim to pick the “best” loads to deploy in each time step. Our choice is a tradeoff between exploration and exploitation, i.e., curtailing poorly characterized loads in order to better characterize them in the hope of benefiting in the future versus curtailing well-characterized loads so that we benefit now. We formulate this problem as a multi-armed restless bandit problem with controlled bandits. In contrast to past work that has assumed all load parameters are known allowing the use of optimization approaches, we assume the parameters of the controlled system are unknown and develop an online learning approach. Our problem has two features not commonly addressed in the bandit literature: the arms/processes evolve according to different probabilistic laws depending on the control, and the reward/feedback observed by the decision-maker is the total realized curtailment, not the curtailment of each load. We develop an adaptive demand response learning algorithm and an extended version that works with aggregate feedback, both aimed at approximating the Whittle index policy. We show numerically that the regret of our algorithms with respect to the Whittle index policy is of logarithmic order in time, and significantly outperforms standard learning algorithms like UCB1.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Household Level Electricity Load Forecasting Using Echo State Network Roaming electric vehicle charging and billing: An anonymous multi-user protocol Generating realistic Smart Grid communication topologies based on real-data Cooperative closed-loop MIMO selective transmissions in a HV environment Integration of V2H/V2G hybrid system for demand response in distribution network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1