连续时间均值场马尔可夫决策模型

IF 1.6 2区 数学 Q2 MATHEMATICS, APPLIED Applied Mathematics and Optimization Pub Date : 2024-06-22 DOI:10.1007/s00245-024-10154-1
Nicole Bäuerle, Sebastian Höfer
{"title":"连续时间均值场马尔可夫决策模型","authors":"Nicole Bäuerle,&nbsp;Sebastian Höfer","doi":"10.1007/s00245-024-10154-1","DOIUrl":null,"url":null,"abstract":"<div><p>We consider a finite number of <i>N</i> statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when <i>N</i> is large. Thus, we consider the limit for <span>\\(N\\rightarrow \\infty .\\)</span> In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is <span>\\(1/\\sqrt{N}\\)</span>. We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.</p></div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"90 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00245-024-10154-1.pdf","citationCount":"0","resultStr":"{\"title\":\"Continuous-Time Mean Field Markov Decision Models\",\"authors\":\"Nicole Bäuerle,&nbsp;Sebastian Höfer\",\"doi\":\"10.1007/s00245-024-10154-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We consider a finite number of <i>N</i> statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when <i>N</i> is large. Thus, we consider the limit for <span>\\\\(N\\\\rightarrow \\\\infty .\\\\)</span> In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is <span>\\\\(1/\\\\sqrt{N}\\\\)</span>. We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.</p></div>\",\"PeriodicalId\":55566,\"journal\":{\"name\":\"Applied Mathematics and Optimization\",\"volume\":\"90 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s00245-024-10154-1.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Mathematics and Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00245-024-10154-1\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-024-10154-1","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

摘要

我们考虑了数量有限、统计上相等的 N 个代理,每个代理根据连续时间马尔可夫决策过程(Markov Decision Process,MDP)在一组有限的状态中移动。代理的转换强度和产生的奖励不仅取决于代理本身的状态和行动,还取决于其他代理的状态和选择的行动。类似这样的交互作用在生物学、流行病学、金融学、社会科学和排队系统等众多模型中都很典型。其目的是使系统的预期贴现回报最大化,即代理必须作为一个团队进行合作。当 N 较大时,这在计算上是一项艰巨的任务。与其他论文不同,我们从 MDP 的角度来处理这个问题。这样做的好处是,与使用 HJB 方程的粘性解相比,我们需要更少的正则性假设来构建渐近最优策略。收敛率是\(1/sqrt{N}\)。我们用两个例子展示了如何应用我们的结果:机器替换问题和流行病问题。我们还证明了极限问题中的最优反馈策略并不一定是渐近最优的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Continuous-Time Mean Field Markov Decision Models

We consider a finite number of N statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for \(N\rightarrow \infty .\) In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is \(1/\sqrt{N}\). We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.30
自引率
5.60%
发文量
103
审稿时长
>12 weeks
期刊介绍: The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.
期刊最新文献
Null Controllability of Coupled Parabolic Systems with Switching Control Pullback Measure Attractors for Non-autonomous Fractional Stochastic Reaction-Diffusion Equations on Unbounded Domains Longtime Dynamics for a Class of Strongly Damped Wave Equations with Variable Exponent Nonlinearities On the Local Existence of Solutions to the Fluid–Structure Interaction Problem with a Free Interface A Stochastic Non-zero-Sum Game of Controlling the Debt-to-GDP Ratio
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1