Quantitative propagation of chaos for mean field Markov decision process with common noise

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY Electronic Journal of Probability Pub Date : 2022-07-26 DOI:10.1214/23-ejp978

M'ed'eric Motte, H. Pham

引用次数: 3

Abstract

We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\gamma \in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\epsilon+\mathcal{O}(M_N^\gamma))$-optimal policies for the $N$-agent model from $\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有公共噪声的平均场Markov决策过程混沌的定量传播

我们研究了具有公共噪声的平均场马尔可夫决策过程（CMKV-MDP）的混沌传播，以及在无限时域上对随机开环控制进行优化时的混沌传播。我们首先给出$M_N^\gamma$阶的收敛速度，其中$M_N$是经验测度在Wasserstein距离上的平均收敛速度，并且$\gamma\in（0,1]$是一个显式常数，在具有非对称开环控制的$N$-agent控制问题的值函数的极限下，对于CMKV-MDP的值函数。此外，我们展示了如何从CMKV-MDP$\epsilon$-最优策略显式构造$N$-agent模型的$（\epsilon\mathcal｛O｝（M_N^\gamma））$-最优政策。我们的方法依赖于$N$-agent问题中的Bellman算子和CMKV-MDP之间的尖锐比较，以及经验测度的精细耦合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Electronic Journal of Probability 数学-统计学与概率论

CiteScore

1.80

自引率

7.10%

发文量

119

审稿时长

4-8 weeks

期刊介绍： The Electronic Journal of Probability publishes full-size research articles in probability theory. The Electronic Communications in Probability (ECP), a sister journal of EJP, publishes short notes and research announcements in probability theory. Both ECP and EJP are official journals of the Institute of Mathematical Statistics and the Bernoulli society.

期刊最新文献

A Palm space approach to non-linear Hawkes processes Stochastic evolution equations with Wick-polynomial nonlinearities Corrigendum to: The sum of powers of subtree sizes for conditioned Galton–Watson trees Stochastic sewing in Banach spaces Convergence rate for geometric statistics of point processes having fast decay of dependence