A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems

arXiv - EE - Systems and Control Pub Date : 2024-09-17 DOI:arxiv-2409.11358

Mostafa M. Shibl, Vijay Gupta

引用次数: 0

Abstract

Learning in games provides a powerful framework to design control policies for self-interested agents that may be coupled through their dynamics, costs, or constraints. We consider the case where the dynamics of the coupled system can be modeled as a Markov potential game. In this case, distributed learning by the agents ensures that their control policies converge to a Nash equilibrium of this game. However, typical learning algorithms such as natural policy gradient require knowledge of the entire global state and actions of all the other agents, and may not be scalable as the number of agents grows. We show that by limiting the information flow to a local neighborhood of agents in the natural policy gradient algorithm, we can converge to a neighborhood of optimal policies. If the game can be designed through decomposing a global cost function of interest to a designer into local costs for the agents such that their policies at equilibrium optimize the global cost, this approach can be of interest to team coordination problems as well. We illustrate our approach through a sensor coverage problem.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多动态系统协调的可扩展博弈论方法

博弈中的学习提供了一个强大的框架，用于为自利代理设计控制策略，这些代理可能通过其动态、成本或约束条件而耦合在一起。我们考虑的情况是，耦合系统的动力学可以建模为马尔可夫势博弈。在这种情况下，代理的分布式学习可确保他们的控制策略收敛到博弈的下均衡。然而，典型的学习算法（如自然政策梯度法）需要了解全局状态和所有其他代理的行动，而且随着代理数量的增加可能无法扩展。我们看到，在自然策略梯度算法中，通过将信息流限制在代理的局部邻域，我们可以收敛到最优策略的邻域。如果可以通过将设计者感兴趣的全局成本函数分解为代理的局部成本来设计博弈，从而使代理在均衡状态下的策略能够优化全局成本，那么这种方法对团队协调问题也很有意义。我们通过一个传感器覆盖问题来说明我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - EE - Systems and Control

自引率

0.00%

发文量