具有风险敏感准则的马尔可夫决策过程:动态规划算子和贴现随机对策

Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228) Pub Date : 2001-12-04 DOI:10.1109/CDC.2001.980564

R. Cavazos-Cadena, E. Fernández-Gaucherand

{"title":"具有风险敏感准则的马尔可夫决策过程:动态规划算子和贴现随机对策","authors":"R. Cavazos-Cadena, E. Fernández-Gaucherand","doi":"10.1109/CDC.2001.980564","DOIUrl":null,"url":null,"abstract":"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.","PeriodicalId":131411,"journal":{"name":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","volume":"128 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games\",\"authors\":\"R. Cavazos-Cadena, E. Fernández-Gaucherand\",\"doi\":\"10.1109/CDC.2001.980564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.\",\"PeriodicalId\":131411,\"journal\":{\"name\":\"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)\",\"volume\":\"128 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2001.980564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2001.980564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

研究了状态空间有限且每阶段代价有限的离散马尔可夫决策过程。假设决策者对风险表现出恒定的敏感性，并且控制政策的绩效是通过(长期)风险敏感的平均成本标准来衡量的。除了标准的连续紧性条件外，决策模型的基本结构约束是过渡律同时满足Doeblin条件。在此框架内，主要目的是研究风险敏感平均成本最优方程的有界解的存在性。我们的主要结果保证了最优性方程只有在风险敏感性系数/spl lambda/足够小时才有有界解，并通过一个详细的例子表明，这样的结论不能推广到/spl lambda/的任意值。我们的结果与文献中先前的说法相反，但同意最近通过直接概率分析获得的结果。本文开发的一个关键分析工具是具有压缩性质的适当算子的定义，类似于Bellman方程中的动态规划算子，以及具有贴现随机对策解释的(值)函数族。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games

We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)

自引率

0.00%

发文量

期刊最新文献

Linear symmetry of nonlinear systems On-line predictive techniques for "differentiated services" networks The Lie algebra structure of spin systems and their controllability properties Time-delayed chaos control with repetitive learning Robust nonlinear motion control of a helicopter