具有风险敏感准则的马尔可夫决策过程:动态规划算子和贴现随机对策

R. Cavazos-Cadena, E. Fernández-Gaucherand
{"title":"具有风险敏感准则的马尔可夫决策过程:动态规划算子和贴现随机对策","authors":"R. Cavazos-Cadena, E. Fernández-Gaucherand","doi":"10.1109/CDC.2001.980564","DOIUrl":null,"url":null,"abstract":"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.","PeriodicalId":131411,"journal":{"name":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","volume":"128 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games\",\"authors\":\"R. Cavazos-Cadena, E. Fernández-Gaucherand\",\"doi\":\"10.1109/CDC.2001.980564\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.\",\"PeriodicalId\":131411,\"journal\":{\"name\":\"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)\",\"volume\":\"128 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2001.980564\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2001.980564","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

研究了状态空间有限且每阶段代价有限的离散马尔可夫决策过程。假设决策者对风险表现出恒定的敏感性,并且控制政策的绩效是通过(长期)风险敏感的平均成本标准来衡量的。除了标准的连续紧性条件外,决策模型的基本结构约束是过渡律同时满足Doeblin条件。在此框架内,主要目的是研究风险敏感平均成本最优方程的有界解的存在性。我们的主要结果保证了最优性方程只有在风险敏感性系数/spl lambda/足够小时才有有界解,并通过一个详细的例子表明,这样的结论不能推广到/spl lambda/的任意值。我们的结果与文献中先前的说法相反,但同意最近通过直接概率分析获得的结果。本文开发的一个关键分析工具是具有压缩性质的适当算子的定义,类似于Bellman方程中的动态规划算子,以及具有贴现随机对策解释的(值)函数族。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Markov decision processes with risk-sensitive criteria: dynamic programming operators and discounted stochastic games
We study discrete-time Markov decision processes with denumerable state space and bounded costs per stage. It is assumed that the decision maker exhibits a constant sensitivity to risk, and that the performance of a control policy is measured by a (long-run) risk-sensitive average cost criterion. Besides standard continuity-compactness conditions, the basic structural constraint on the decision model is that the transition law satisfies a simultaneous Doeblin condition. Within this framework, the main objective is to study the existence of bounded solutions to the risk-sensitive average cost optimality equation. Our main result guarantees a bounded solution to the optimality equation only if the risk sensitivity coefficient /spl lambda/ is small enough and, via a detailed example, it can be shown that such a conclusion cannot be extended to arbitrary values of /spl lambda/. Our results are in opposition to previous claims in the literature, but agree with recent results obtained via a direct probabilistic analysis. A key analysis tool developed in the paper is the definition of an appropriate operator with contractive properties, analogous to the dynamic programming operator in Bellman's equation, and a family of (value) functions with a discounted stochastic games interpretation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Linear symmetry of nonlinear systems On-line predictive techniques for "differentiated services" networks The Lie algebra structure of spin systems and their controllability properties Time-delayed chaos control with repetitive learning Robust nonlinear motion control of a helicopter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1