Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations Without Belief-Reduction

IF 1.6 2区 数学 Q2 MATHEMATICS, APPLIED Applied Mathematics and Optimization Pub Date : 2025-01-04 DOI:10.1007/s00245-024-10211-9
Serdar Yüksel
{"title":"Another Look at Partially Observed Optimal Stochastic Control: Existence, Ergodicity, and Approximations Without Belief-Reduction","authors":"Serdar Yüksel","doi":"10.1007/s00245-024-10211-9","DOIUrl":null,"url":null,"abstract":"<div><p>We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with the history process serving as the controlled state process and a further refinement in which the control actions and the state process are causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies under the discounted cost and average cost infinite horizon criteria. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. For the average cost setup, we provide new existence conditions and also a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for nonlinear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.</p></div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"91 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-024-10211-9","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

We present an alternative view for the study of optimal control of partially observed Markov Decision Processes (POMDPs). We first revisit the traditional (and by now standard) separated-design method of reducing the problem to fully observed MDPs (belief-MDPs), and present conditions for the existence of optimal policies. Then, rather than working with this standard method, we define a Markov chain taking values in an infinite dimensional product space with the history process serving as the controlled state process and a further refinement in which the control actions and the state process are causally conditionally independent given the measurement/information process. We provide new sufficient conditions for the existence of optimal control policies under the discounted cost and average cost infinite horizon criteria. In particular, while in the belief-MDP reduction of POMDPs, weak Feller condition requirement imposes total variation continuity on either the system kernel or the measurement kernel, with the approach of this paper only weak continuity of both the transition kernel and the measurement kernel is needed (and total variation continuity is not) together with regularity conditions related to filter stability. For the discounted cost setup, we establish near optimality of finite window policies via a direct argument involving near optimality of quantized approximations for MDPs under weak Feller continuity, where finite truncations of memory can be viewed as quantizations of infinite memory with a uniform diameter in each finite window restriction under the product metric. For the average cost setup, we provide new existence conditions and also a general approach on how to initialize the randomness which we show to establish convergence to optimal cost. In the control-free case, our analysis leads to new and weak conditions for the existence and uniqueness of invariant probability measures for nonlinear filter processes, where we show that unique ergodicity of the measurement process and a measurability condition related to filter stability leads to unique ergodicity.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
再看部分可观察的最优随机控制:存在性、遍历性和无信念约简的近似
我们提出了部分观测马尔可夫决策过程(pomdp)最优控制研究的另一种观点。我们首先重新审视传统的(也是现在标准的)分离设计方法,将问题简化为完全观察到的MDPs(信念MDPs),并给出最优策略存在的条件。然后,我们定义了一个马尔可夫链,在无限维的积空间中取值,历史过程作为受控状态过程,并进一步细化,其中控制动作和状态过程在给定的测量/信息过程中是因果条件独立的。在折现成本和平均成本无限水平准则下,给出了最优控制策略存在的新的充分条件。特别是,在pomdp的belief-MDP约简中,弱Feller条件要求对系统核或测量核都施加了总变差连续性,而本文的方法只需要转换核和测量核都具有弱连续性(不需要总变差连续性)以及与滤波器稳定性相关的正则性条件。对于贴现成本设置,我们通过涉及弱Feller连续性下mdp量化近似的近最优性的直接论证建立了有限窗口策略的近最优性,其中内存的有限截断可以被视为在产品度量下每个有限窗口限制下具有均匀直径的无限内存的量化。对于平均代价设置,我们给出了新的存在条件,并给出了如何初始化随机性的一般方法,证明了该方法可以收敛到最优代价。在无控制情况下,我们的分析给出了非线性滤波过程不变概率测度存在唯一性的新的弱条件,其中我们证明了测量过程的唯一遍历性和与滤波器稳定性相关的可测性条件导致唯一遍历性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.30
自引率
5.60%
发文量
103
审稿时长
>12 weeks
期刊介绍: The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.
期刊最新文献
Exact Controllability to Nonnegative Trajectory for a Chemotaxis System Fractional, Semilinear, and Sparse Optimal Control: A Priori Error Bounds On the Relationship Between Viscosity and Distribution Solutions for Nonlinear Neumann Type PDEs: The Probabilistic Approach Asymptotic Behavior of Rao–Nakra Sandwich Beam with Nonlinear Localized Damping and Source Terms Strict Efficiency in Vector Optimization Via a Directional Curvature Functional
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1