再看部分可观察的最优随机控制：存在性、遍历性和无信念约简的近似

IF 1.6 2区数学 Q2 MATHEMATICS, APPLIED Applied Mathematics and Optimization Pub Date : 2025-01-04 DOI:10.1007/s00245-024-10211-9

Serdar Yüksel

{"title":"再看部分可观察的最优随机控制：存在性、遍历性和无信念约简的近似","authors":"Serdar Yüksel","doi":"10.1007/s00245-024-10211-9","DOIUrl":null,"url":null,"abstract":"<div><p>We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with the history process serving as the controlled state process and a further refinement in which the control actions and the state process are causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies under the discounted cost and average cost infinite horizon criteria. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. For the average cost setup, we provide new existence conditions and also a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for nonlinear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.</p></div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"91 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations Without Belief-Reduction\",\"authors\":\"Serdar Yüksel\",\"doi\":\"10.1007/s00245-024-10211-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with the history process serving as the controlled state process and a further refinement in which the control actions and the state process are causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies under the discounted cost and average cost infinite horizon criteria. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. For the average cost setup, we provide new existence conditions and also a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for nonlinear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.</p></div>\",\"PeriodicalId\":55566,\"journal\":{\"name\":\"Applied Mathematics and Optimization\",\"volume\":\"91 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Mathematics and Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00245-024-10211-9\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-024-10211-9","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了部分观测马尔可夫决策过程（pomdp）最优控制研究的另一种观点。我们首先重新审视传统的（也是现在标准的）分离设计方法，将问题简化为完全观察到的MDPs（信念MDPs），并给出最优策略存在的条件。然后，我们定义了一个马尔可夫链，在无限维的积空间中取值，历史过程作为受控状态过程，并进一步细化，其中控制动作和状态过程在给定的测量/信息过程中是因果条件独立的。在折现成本和平均成本无限水平准则下，给出了最优控制策略存在的新的充分条件。特别是，在pomdp的belief-MDP约简中，弱Feller条件要求对系统核或测量核都施加了总变差连续性，而本文的方法只需要转换核和测量核都具有弱连续性（不需要总变差连续性）以及与滤波器稳定性相关的正则性条件。对于贴现成本设置，我们通过涉及弱Feller连续性下mdp量化近似的近最优性的直接论证建立了有限窗口策略的近最优性，其中内存的有限截断可以被视为在产品度量下每个有限窗口限制下具有均匀直径的无限内存的量化。对于平均代价设置，我们给出了新的存在条件，并给出了如何初始化随机性的一般方法，证明了该方法可以收敛到最优代价。在无控制情况下，我们的分析给出了非线性滤波过程不变概率测度存在唯一性的新的弱条件，其中我们证明了测量过程的唯一遍历性和与滤波器稳定性相关的可测性条件导致唯一遍历性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations Without Belief-Reduction

We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with the history process serving as the controlled state process and a further refinement in which the control actions and the state process are causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies under the discounted cost and average cost infinite horizon criteria. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. For the average cost setup, we provide new existence conditions and also a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for nonlinear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Mathematics and Optimization 数学-应用数学

CiteScore

3.30

自引率

5.60%

发文量

103

审稿时长

>12 weeks

期刊介绍： The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.