方差准则下马尔可夫决策过程参数策略的优化

L. Xia
{"title":"方差准则下马尔可夫决策过程参数策略的优化","authors":"L. Xia","doi":"10.1109/WODES.2016.7497868","DOIUrl":null,"url":null,"abstract":"The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.","PeriodicalId":268613,"journal":{"name":"2016 13th International Workshop on Discrete Event Systems (WODES)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Optimization of parametric policies of Markov decision processes under a variance criterion\",\"authors\":\"L. Xia\",\"doi\":\"10.1109/WODES.2016.7497868\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.\",\"PeriodicalId\":268613,\"journal\":{\"name\":\"2016 13th International Workshop on Discrete Event Systems (WODES)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 13th International Workshop on Discrete Event Systems (WODES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WODES.2016.7497868\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 13th International Workshop on Discrete Event Systems (WODES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WODES.2016.7497868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

方差准则是马尔可夫决策过程中一个少见而又重要的准则。由于方差函数的非线性(二次)结构所导致的非马尔可夫性,使得传统的MDP方法对该问题无效。本文研究了方差准则下mdp参数策略的优化问题,其中优化参数是在每个状态下选择行动的概率。利用基于灵敏度优化的基本思想,导出了系统参数下奖励方差的差分公式和导数公式。方差差分公式是解决这一问题的基础,它通过一个非负项部分地解决了方差函数非线性性质的困难。利用这些灵敏度公式,我们证明了在确定性策略空间中可以找到方差最小的最优策略。导出了最优策略的一个必要条件。与文献中基于梯度的方法相比,我们的方法可以为这种方差优化问题提供一个清晰的视角。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Optimization of parametric policies of Markov decision processes under a variance criterion
The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploiting symmetry of state tree structures for discrete-event systems with parallel components Strategies for two-player differential games with costly information Communication rate analysis for event-based state estimation Concolic test generation for PLC programs using coverage metrics Discontinuities and non-monotonicities in Mono-T-Semiflow timed continuous Petri nets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1