{"title":"方差准则下马尔可夫决策过程参数策略的优化","authors":"L. Xia","doi":"10.1109/WODES.2016.7497868","DOIUrl":null,"url":null,"abstract":"The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.","PeriodicalId":268613,"journal":{"name":"2016 13th International Workshop on Discrete Event Systems (WODES)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Optimization of parametric policies of Markov decision processes under a variance criterion\",\"authors\":\"L. Xia\",\"doi\":\"10.1109/WODES.2016.7497868\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.\",\"PeriodicalId\":268613,\"journal\":{\"name\":\"2016 13th International Workshop on Discrete Event Systems (WODES)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 13th International Workshop on Discrete Event Systems (WODES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WODES.2016.7497868\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 13th International Workshop on Discrete Event Systems (WODES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WODES.2016.7497868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimization of parametric policies of Markov decision processes under a variance criterion
The variance criterion is an uncommon while important criterion in Markov decision processes. The non-Markovian property caused by the nonlinear (quadratic) structure of variance function makes the traditional MDP approaches invalid for this problem. In this paper, we study the optimization of parametric policies of MDPs under the variance criterion, where the optimization parameters are the probabilities of selecting actions at each state. With the basic idea of sensitivity-based optimization, we derive a difference formula and a derivative formula of the reward variance with respect to the system parameter. The variance difference formula is fundamental for this problem and it partly handles the difficulty of nonlinear property of variance function through a nonnegative term. With these sensitivity formulas, we prove that the optimal policy with the minimal variance can be found in the deterministic policy space. A necessary condition of the optimal policy is also derived. Compared with the counterpart of gradient-based approaches in the literature, our approach can provide a clear viewpoint for this variance optimization problem.