Joint-perturbation simultaneous pseudo-gradient

Carlos Martin, Tuomas Sandholm
{"title":"Joint-perturbation simultaneous pseudo-gradient","authors":"Carlos Martin, Tuomas Sandholm","doi":"arxiv-2408.09306","DOIUrl":null,"url":null,"abstract":"We study the problem of computing an approximate Nash equilibrium of a game\nwhose strategy space is continuous without access to gradients of the utility\nfunction. Such games arise, for example, when players' strategies are\nrepresented by the parameters of a neural network. Lack of access to gradients\nis common in reinforcement learning settings, where the environment is treated\nas a black box, as well as equilibrium finding in mechanisms such as auctions,\nwhere the mechanism's payoffs are discontinuous in the players' actions. To\ntackle this problem, we turn to zeroth-order optimization techniques that\ncombine pseudo-gradients with equilibrium-finding dynamics. Specifically, we\nintroduce a new technique that requires a number of utility function\nevaluations per iteration that is constant rather than linear in the number of\nplayers. It achieves this by performing a single joint perturbation on all\nplayers' strategies, rather than perturbing each one individually. This yields\na dramatic improvement for many-player games, especially when the utility\nfunction is expensive to compute in terms of wall time, memory, money, or other\nresources. We evaluate our approach on various games, including auctions, which\nhave important real-world applications. Our approach yields a significant\nreduction in the run time required to reach an approximate Nash equilibrium.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We study the problem of computing an approximate Nash equilibrium of a game whose strategy space is continuous without access to gradients of the utility function. Such games arise, for example, when players' strategies are represented by the parameters of a neural network. Lack of access to gradients is common in reinforcement learning settings, where the environment is treated as a black box, as well as equilibrium finding in mechanisms such as auctions, where the mechanism's payoffs are discontinuous in the players' actions. To tackle this problem, we turn to zeroth-order optimization techniques that combine pseudo-gradients with equilibrium-finding dynamics. Specifically, we introduce a new technique that requires a number of utility function evaluations per iteration that is constant rather than linear in the number of players. It achieves this by performing a single joint perturbation on all players' strategies, rather than perturbing each one individually. This yields a dramatic improvement for many-player games, especially when the utility function is expensive to compute in terms of wall time, memory, money, or other resources. We evaluate our approach on various games, including auctions, which have important real-world applications. Our approach yields a significant reduction in the run time required to reach an approximate Nash equilibrium.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
联合扰动同步伪梯度
我们研究的问题是,在无法获得效用函数梯度的情况下,如何计算策略空间连续的博弈的近似纳什均衡。例如,当玩家的策略由神经网络的参数表示时,就会出现这种博弈。无法获得梯度在强化学习环境中很常见,在这种环境中,环境被视为一个黑盒子;在诸如拍卖等机制中也很常见,在这种机制中,机制的回报与玩家的行动是不连续的。为了解决这个问题,我们转向了将伪梯度与均衡寻找动力学相结合的零阶优化技术。具体来说,我们引入了一种新技术,它要求每次迭代的效用函数评估次数是恒定的,而不是与玩家数量成线性关系。它通过对所有玩家的策略进行一次联合扰动,而不是单独扰动每个玩家的策略来实现这一目标。这对多人博弈产生了巨大的改善,尤其是当效用函数的计算需要耗费大量时间、内存、金钱或其他资源时。我们在各种游戏中评估了我们的方法,包括在现实世界中有着重要应用的拍卖。我们的方法大大减少了达到近似纳什均衡所需的运行时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1