连续时间均值场马尔可夫决策模型

IF 1.7 2区数学 Q2 MATHEMATICS, APPLIED Applied Mathematics and Optimization Pub Date : 2024-06-22 DOI:10.1007/s00245-024-10154-1

Nicole Bäuerle, Sebastian Höfer

{"title":"连续时间均值场马尔可夫决策模型","authors":"Nicole Bäuerle, Sebastian Höfer","doi":"10.1007/s00245-024-10154-1","DOIUrl":null,"url":null,"abstract":"<div>We consider a finite number of N statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for \\(N\\rightarrow \\infty .\\) In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is \\(1/\\sqrt{N}\\). We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.</div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"90 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00245-024-10154-1.pdf","citationCount":"0","resultStr":"{\"title\":\"Continuous-Time Mean Field Markov Decision Models\",\"authors\":\"Nicole Bäuerle, Sebastian Höfer\",\"doi\":\"10.1007/s00245-024-10154-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>We consider a finite number of N statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for \\\\(N\\\\rightarrow \\\\infty .\\\\) In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is \\\\(1/\\\\sqrt{N}\\\\). We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.</div>\",\"PeriodicalId\":55566,\"journal\":{\"name\":\"Applied Mathematics and Optimization\",\"volume\":\"90 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s00245-024-10154-1.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Mathematics and Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00245-024-10154-1\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-024-10154-1","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

我们考虑了数量有限、统计上相等的 N 个代理，每个代理根据连续时间马尔可夫决策过程（Markov Decision Process，MDP）在一组有限的状态中移动。代理的转换强度和产生的奖励不仅取决于代理本身的状态和行动，还取决于其他代理的状态和选择的行动。类似这样的交互作用在生物学、流行病学、金融学、社会科学和排队系统等众多模型中都很典型。其目的是使系统的预期贴现回报最大化，即代理必须作为一个团队进行合作。当 N 较大时，这在计算上是一项艰巨的任务。与其他论文不同，我们从 MDP 的角度来处理这个问题。这样做的好处是，与使用 HJB 方程的粘性解相比，我们需要更少的正则性假设来构建渐近最优策略。收敛率是\(1/sqrt{N}\)。我们用两个例子展示了如何应用我们的结果：机器替换问题和流行病问题。我们还证明了极限问题中的最优反馈策略并不一定是渐近最优的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Continuous-Time Mean Field Markov Decision Models

We consider a finite number of N statistically equal agents, each moving on a finite set of states according to a continuous-time Markov Decision Process (MDP). Transition intensities of the agents and generated rewards depend not only on the state and action of the agent itself, but also on the states of the other agents as well as the chosen action. Interactions like this are typical for a wide range of models in e.g. biology, epidemics, finance, social science and queueing systems among others. The aim is to maximize the expected discounted reward of the system, i.e. the agents have to cooperate as a team. Computationally this is a difficult task when N is large. Thus, we consider the limit for \(N\rightarrow \infty .\) In contrast to other papers we treat this problem from an MDP perspective. This has the advantage that we need less regularity assumptions in order to construct asymptotically optimal strategies than using viscosity solutions of HJB equations. The convergence rate is \(1/\sqrt{N}\). We show how to apply our results using two examples: a machine replacement problem and a problem from epidemics. We also show that optimal feedback policies from the limiting problem are not necessarily asymptotically optimal.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Mathematics and Optimization 数学-应用数学

CiteScore

3.30

自引率

5.60%

发文量

103

审稿时长

>12 weeks

期刊介绍： The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.