联合扰动同步伪梯度

arXiv - CS - Multiagent Systems Pub Date : 2024-08-17 DOI:arxiv-2408.09306

Carlos Martin, Tuomas Sandholm

{"title":"联合扰动同步伪梯度","authors":"Carlos Martin, Tuomas Sandholm","doi":"arxiv-2408.09306","DOIUrl":null,"url":null,"abstract":"We study the problem of computing an approximate Nash equilibrium of a game\nwhose strategy space is continuous without access to gradients of the utility\nfunction. Such games arise, for example, when players' strategies are\nrepresented by the parameters of a neural network. Lack of access to gradients\nis common in reinforcement learning settings, where the environment is treated\nas a black box, as well as equilibrium finding in mechanisms such as auctions,\nwhere the mechanism's payoffs are discontinuous in the players' actions. To\ntackle this problem, we turn to zeroth-order optimization techniques that\ncombine pseudo-gradients with equilibrium-finding dynamics. Specifically, we\nintroduce a new technique that requires a number of utility function\nevaluations per iteration that is constant rather than linear in the number of\nplayers. It achieves this by performing a single joint perturbation on all\nplayers' strategies, rather than perturbing each one individually. This yields\na dramatic improvement for many-player games, especially when the utility\nfunction is expensive to compute in terms of wall time, memory, money, or other\nresources. We evaluate our approach on various games, including auctions, which\nhave important real-world applications. Our approach yields a significant\nreduction in the run time required to reach an approximate Nash equilibrium.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint-perturbation simultaneous pseudo-gradient\",\"authors\":\"Carlos Martin, Tuomas Sandholm\",\"doi\":\"arxiv-2408.09306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of computing an approximate Nash equilibrium of a game\\nwhose strategy space is continuous without access to gradients of the utility\\nfunction. Such games arise, for example, when players' strategies are\\nrepresented by the parameters of a neural network. Lack of access to gradients\\nis common in reinforcement learning settings, where the environment is treated\\nas a black box, as well as equilibrium finding in mechanisms such as auctions,\\nwhere the mechanism's payoffs are discontinuous in the players' actions. To\\ntackle this problem, we turn to zeroth-order optimization techniques that\\ncombine pseudo-gradients with equilibrium-finding dynamics. Specifically, we\\nintroduce a new technique that requires a number of utility function\\nevaluations per iteration that is constant rather than linear in the number of\\nplayers. It achieves this by performing a single joint perturbation on all\\nplayers' strategies, rather than perturbing each one individually. This yields\\na dramatic improvement for many-player games, especially when the utility\\nfunction is expensive to compute in terms of wall time, memory, money, or other\\nresources. We evaluate our approach on various games, including auctions, which\\nhave important real-world applications. Our approach yields a significant\\nreduction in the run time required to reach an approximate Nash equilibrium.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.09306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们研究的问题是，在无法获得效用函数梯度的情况下，如何计算策略空间连续的博弈的近似纳什均衡。例如，当玩家的策略由神经网络的参数表示时，就会出现这种博弈。无法获得梯度在强化学习环境中很常见，在这种环境中，环境被视为一个黑盒子；在诸如拍卖等机制中也很常见，在这种机制中，机制的回报与玩家的行动是不连续的。为了解决这个问题，我们转向了将伪梯度与均衡寻找动力学相结合的零阶优化技术。具体来说，我们引入了一种新技术，它要求每次迭代的效用函数评估次数是恒定的，而不是与玩家数量成线性关系。它通过对所有玩家的策略进行一次联合扰动，而不是单独扰动每个玩家的策略来实现这一目标。这对多人博弈产生了巨大的改善，尤其是当效用函数的计算需要耗费大量时间、内存、金钱或其他资源时。我们在各种游戏中评估了我们的方法，包括在现实世界中有着重要应用的拍卖。我们的方法大大减少了达到近似纳什均衡所需的运行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Joint-perturbation simultaneous pseudo-gradient

We study the problem of computing an approximate Nash equilibrium of a game whose strategy space is continuous without access to gradients of the utility function. Such games arise, for example, when players' strategies are represented by the parameters of a neural network. Lack of access to gradients is common in reinforcement learning settings, where the environment is treated as a black box, as well as equilibrium finding in mechanisms such as auctions, where the mechanism's payoffs are discontinuous in the players' actions. To tackle this problem, we turn to zeroth-order optimization techniques that combine pseudo-gradients with equilibrium-finding dynamics. Specifically, we introduce a new technique that requires a number of utility function evaluations per iteration that is constant rather than linear in the number of players. It achieves this by performing a single joint perturbation on all players' strategies, rather than perturbing each one individually. This yields a dramatic improvement for many-player games, especially when the utility function is expensive to compute in terms of wall time, memory, money, or other resources. We evaluate our approach on various games, including auctions, which have important real-world applications. Our approach yields a significant reduction in the run time required to reach an approximate Nash equilibrium.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量