由风险寻求控制器驱动的马尔可夫决策链的最优平均成本表征

IF 0.7 4区 数学 Q3 STATISTICS & PROBABILITY Journal of Applied Probability Pub Date : 2023-07-21 DOI:10.1017/jpr.2023.40
R. Cavazos-Cadena, H. Cruz-Suárez, Raúl Montes-de-Oca
{"title":"由风险寻求控制器驱动的马尔可夫决策链的最优平均成本表征","authors":"R. Cavazos-Cadena, H. Cruz-Suárez, Raúl Montes-de-Oca","doi":"10.1017/jpr.2023.40","DOIUrl":null,"url":null,"abstract":"\n This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller\",\"authors\":\"R. Cavazos-Cadena, H. Cruz-Suárez, Raúl Montes-de-Oca\",\"doi\":\"10.1017/jpr.2023.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.\",\"PeriodicalId\":50256,\"journal\":{\"name\":\"Journal of Applied Probability\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Probability\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1017/jpr.2023.40\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/jpr.2023.40","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

本文研究具有有限代价函数的可数状态空间上的马尔可夫决策链。控制政策的绩效是通过长期平均标准来评估的,该标准是由具有恒定风险敏感性的风险寻求决策者衡量的。除了标准的连续紧性条件外,本文的框架由以下条件确定:(i)状态过程在每个平稳策略下都是通信的,(ii)同时Doeblin条件成立。在此框架内,证明了(i)最优上、下极限平均值函数重合且为常数,(ii)最优平均代价通过正矩阵理论中Collatz-Wielandt公式的扩展版本来表征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller
This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Applied Probability
Journal of Applied Probability 数学-统计学与概率论
CiteScore
1.50
自引率
10.00%
发文量
92
审稿时长
6-12 weeks
期刊介绍: Journal of Applied Probability is the oldest journal devoted to the publication of research in the field of applied probability. It is an international journal published by the Applied Probability Trust, and it serves as a companion publication to the Advances in Applied Probability. Its wide audience includes leading researchers across the entire spectrum of applied probability, including biosciences applications, operations research, telecommunications, computer science, engineering, epidemiology, financial mathematics, the physical and social sciences, and any field where stochastic modeling is used. A submission to Applied Probability represents a submission that may, at the Editor-in-Chief’s discretion, appear in either the Journal of Applied Probability or the Advances in Applied Probability. Typically, shorter papers appear in the Journal, with longer contributions appearing in the Advances.
期刊最新文献
The dutch draw: constructing a universal baseline for binary classification problems Transience of continuous-time conservative random walks Efficiency of reversible MCMC methods: elementary derivations and applications to composite methods A non-homogeneous alternating renewal process model for interval censoring An algorithm to construct coherent systems using signatures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1