Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

CompSciRN: Other Machine Learning (Topic) Pub Date : 2021-07-27 DOI:10.2139/ssrn.3894471

B. Hambly, Renyuan Xu, Huining Yang

引用次数: 13

Abstract

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove convergence of the method we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

策略梯度法求解n人一般和线性二次对策的纳什均衡

考虑一个有限视界上的n人线性二次博弈，证明了自然策略梯度法对纳什均衡的全局收敛性。为了证明该方法的收敛性，需要在系统中加入一定量的噪声。为了保证收敛，我们给出了一个条件，本质上是关于模型参数的噪声协方差的下界。我们用数值实验来说明我们的结果，表明即使在策略梯度方法在确定性设置下可能不收敛的情况下，噪声的加入也会导致收敛。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

CompSciRN: Other Machine Learning (Topic)

自引率

0.00%

发文量