方差准则下马尔可夫决策过程参数策略的优化

2016 13th International Workshop on Discrete Event Systems (WODES) Pub Date : 2016-05-01 DOI:10.1109/WODES.2016.7497868

L. Xia

{"title":"方差准则下马尔可夫决策过程参数策略的优化","authors":"L. Xia","doi":"10.1109/WODES.2016.7497868","DOIUrl":null,"url":null,"abstract":"The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.","PeriodicalId":268613,"journal":{"name":"2016 13th International Workshop on Discrete Event Systems (WODES)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Optimization of parametric policies of Markov decision processes under a variance criterion\",\"authors\":\"L. Xia\",\"doi\":\"10.1109/WODES.2016.7497868\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.\",\"PeriodicalId\":268613,\"journal\":{\"name\":\"2016 13th International Workshop on Discrete Event Systems (WODES)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 13th International Workshop on Discrete Event Systems (WODES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WODES.2016.7497868\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 13th International Workshop on Discrete Event Systems (WODES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WODES.2016.7497868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

方差准则是马尔可夫决策过程中一个少见而又重要的准则。由于方差函数的非线性(二次)结构所导致的非马尔可夫性，使得传统的MDP方法对该问题无效。本文研究了方差准则下mdp参数策略的优化问题，其中优化参数是在每个状态下选择行动的概率。利用基于灵敏度优化的基本思想，导出了系统参数下奖励方差的差分公式和导数公式。方差差分公式是解决这一问题的基础，它通过一个非负项部分地解决了方差函数非线性性质的困难。利用这些灵敏度公式，我们证明了在确定性策略空间中可以找到方差最小的最优策略。导出了最优策略的一个必要条件。与文献中基于梯度的方法相比，我们的方法可以为这种方差优化问题提供一个清晰的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimization of parametric policies of Markov decision processes under a variance criterion

The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 13th International Workshop on Discrete Event Systems (WODES)

自引率

0.00%

发文量