{"title":"A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems","authors":"Mostafa M. Shibl, Vijay Gupta","doi":"arxiv-2409.11358","DOIUrl":null,"url":null,"abstract":"Learning in games provides a powerful framework to design control policies\nfor self-interested agents that may be coupled through their dynamics, costs,\nor constraints. We consider the case where the dynamics of the coupled system\ncan be modeled as a Markov potential game. In this case, distributed learning\nby the agents ensures that their control policies converge to a Nash\nequilibrium of this game. However, typical learning algorithms such as natural\npolicy gradient require knowledge of the entire global state and actions of all\nthe other agents, and may not be scalable as the number of agents grows. We\nshow that by limiting the information flow to a local neighborhood of agents in\nthe natural policy gradient algorithm, we can converge to a neighborhood of\noptimal policies. If the game can be designed through decomposing a global cost\nfunction of interest to a designer into local costs for the agents such that\ntheir policies at equilibrium optimize the global cost, this approach can be of\ninterest to team coordination problems as well. We illustrate our approach\nthrough a sensor coverage problem.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning in games provides a powerful framework to design control policies
for self-interested agents that may be coupled through their dynamics, costs,
or constraints. We consider the case where the dynamics of the coupled system
can be modeled as a Markov potential game. In this case, distributed learning
by the agents ensures that their control policies converge to a Nash
equilibrium of this game. However, typical learning algorithms such as natural
policy gradient require knowledge of the entire global state and actions of all
the other agents, and may not be scalable as the number of agents grows. We
show that by limiting the information flow to a local neighborhood of agents in
the natural policy gradient algorithm, we can converge to a neighborhood of
optimal policies. If the game can be designed through decomposing a global cost
function of interest to a designer into local costs for the agents such that
their policies at equilibrium optimize the global cost, this approach can be of
interest to team coordination problems as well. We illustrate our approach
through a sensor coverage problem.