由风险寻求控制器驱动的马尔可夫决策链的最优平均成本表征

IF 0.7 4区数学 Q3 STATISTICS & PROBABILITY Journal of Applied Probability Pub Date : 2023-07-21 DOI:10.1017/jpr.2023.40

R. Cavazos-Cadena, H. Cruz-Suárez, Raúl Montes-de-Oca

{"title":"由风险寻求控制器驱动的马尔可夫决策链的最优平均成本表征","authors":"R. Cavazos-Cadena, H. Cruz-Suárez, Raúl Montes-de-Oca","doi":"10.1017/jpr.2023.40","DOIUrl":null,"url":null,"abstract":"\n This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":" ","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller\",\"authors\":\"R. Cavazos-Cadena, H. Cruz-Suárez, Raúl Montes-de-Oca\",\"doi\":\"10.1017/jpr.2023.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.\",\"PeriodicalId\":50256,\"journal\":{\"name\":\"Journal of Applied Probability\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Probability\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1017/jpr.2023.40\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/jpr.2023.40","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

本文研究具有有限代价函数的可数状态空间上的马尔可夫决策链。控制政策的绩效是通过长期平均标准来评估的，该标准是由具有恒定风险敏感性的风险寻求决策者衡量的。除了标准的连续紧性条件外，本文的框架由以下条件确定:(i)状态过程在每个平稳策略下都是通信的，(ii)同时Doeblin条件成立。在此框架内，证明了(i)最优上、下极限平均值函数重合且为常数，(ii)最优平均代价通过正矩阵理论中Collatz-Wielandt公式的扩展版本来表征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Characterization of the optimal average cost in Markov decision chains driven by a risk-seeking controller

This work concerns Markov decision chains on a denumerable state space endowed with a bounded cost function. The performance of a control policy is assessed by a long-run average criterion as measured by a risk-seeking decision maker with constant risk-sensitivity. Besides standard continuity–compactness conditions, the framework of the paper is determined by the following conditions: (i) the state process is communicating under each stationary policy, and (ii) the simultaneous Doeblin condition holds. Within this framework it is shown that (i) the optimal superior and inferior limit average value functions coincide and are constant, and (ii) the optimal average cost is characterized via an extended version of the Collatz–Wielandt formula in the theory of positive matrices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Applied Probability 数学-统计学与概率论

CiteScore

1.50

自引率

10.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Journal of Applied Probability is the oldest journal devoted to the publication of research in the field of applied probability. It is an international journal published by the Applied Probability Trust, and it serves as a companion publication to the Advances in Applied Probability. Its wide audience includes leading researchers across the entire spectrum of applied probability, including biosciences applications, operations research, telecommunications, computer science, engineering, epidemiology, financial mathematics, the physical and social sciences, and any field where stochastic modeling is used. A submission to Applied Probability represents a submission that may, at the Editor-in-Chief’s discretion, appear in either the Journal of Applied Probability or the Advances in Applied Probability. Typically, shorter papers appear in the Journal, with longer contributions appearing in the Advances.