基于噪声部分状态观测和不完全轨迹的反馈动态博弈的代价推断

Adaptive Agents and Multi-Agent Systems Pub Date : 2023-01-04 DOI:10.48550/arXiv.2301.01398

Jingqi Li, Chih-Yuan Chiu, Lasse Peters, S. Sojoudi, C. Tomlin, David Fridovich-Keil

{"title":"基于噪声部分状态观测和不完全轨迹的反馈动态博弈的代价推断","authors":"Jingqi Li, Chih-Yuan Chiu, Lasse Peters, S. Sojoudi, C. Tomlin, David Fridovich-Keil","doi":"10.48550/arXiv.2301.01398","DOIUrl":null,"url":null,"abstract":"In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories\",\"authors\":\"Jingqi Li, Chih-Yuan Chiu, Lasse Peters, S. Sojoudi, C. Tomlin, David Fridovich-Keil\",\"doi\":\"10.48550/arXiv.2301.01398\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.\",\"PeriodicalId\":326727,\"journal\":{\"name\":\"Adaptive Agents and Multi-Agent Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adaptive Agents and Multi-Agent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2301.01398\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adaptive Agents and Multi-Agent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.01398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在多智能体动态博弈中，每个智能体的纳什均衡状态轨迹由其成本函数和博弈的信息模式决定。然而，每个代理的成本和轨迹可能对其他代理不可用。先前使用部分观察来推断动态博弈成本的工作假设了一个开环信息模式。在这项工作中，我们证明了反馈纳什均衡概念更具表现力和编码更复杂的行为。我们需要开发特定的工具来推断玩家在反馈游戏中的目标。因此，我们考虑反馈信息模式下，仅使用部分状态观测和不完全轨迹数据的动态博弈成本推理问题。为此，我们首先提出了一个逆反馈博弈损失函数，其最小值产生最接近观测数据的反馈纳什均衡状态轨迹。我们刻画了损失函数的格局和可微性。考虑到获得精确梯度的难度，我们的主要贡献是一个有效的梯度近似器，它使一种新的逆反馈博弈求解器能够使用一阶优化最小化损失。通过深入的实证评估，我们证明了当观察数据反映了一组参与者在反馈纳什博弈中的行为时，我们的算法收敛可靠，并且比开环基线方法具有更好的鲁棒性和泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Cost Inference for Feedback Dynamic Games from Noisy Partial State Observations and Incomplete Trajectories

In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助