Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An
{"title":"强化纳什均衡求解器","authors":"Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An","doi":"10.5555/3635637.3663224","DOIUrl":null,"url":null,"abstract":"Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as\"inexact solvers\", or\"solvers\"for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $\\alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $\\alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Nash Equilibrium Solver\",\"authors\":\"Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An\",\"doi\":\"10.5555/3635637.3663224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as\\\"inexact solvers\\\", or\\\"solvers\\\"for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $\\\\alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $\\\\alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.\",\"PeriodicalId\":326727,\"journal\":{\"name\":\"Adaptive Agents and Multi-Agent Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adaptive Agents and Multi-Agent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/3635637.3663224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adaptive Agents and Multi-Agent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/3635637.3663224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
纳什均衡(NE)是博弈论的典型解概念,它为理解理性提供了一个优雅的工具。虽然混合策略 NE 存在于任何具有有限玩家和行动的博弈中,但在双人或多人泛和博弈中计算 NE 是 PPAD-Complete 的。人们提出了各种替代方案(如相关均衡(CE))和学习方法(如虚构博弈(FP))来逼近近净。为方便起见,我们称这些方法为 "非精确求解器",简称 "求解器"。然而,替代解与 NE 存在差异,学习方法通常无法收敛到 NE。因此,在这项工作中,我们提出了 "强化纳什均衡求解器"(REinforcement Nash Equilibrium Solver,RENES),它通过训练单一策略来修改不同大小的博弈,并在修改后的博弈中应用求解器,在原始博弈中评估所获得的解。i) 我们将博弈表示为 $\alpha$-rank 响应图,并利用图神经网络(GNN)来处理作为输入的不同规模的博弈; ii) 我们使用张量分解,例如iii) 我们使用广泛使用的近端策略优化(PPO)来训练博弈的修改策略,并应用求解器来求解修改后的博弈,在原始博弈中对所获得的解进行评估。在大规模正态博弈中的广泛实验表明,我们的方法可以进一步提高不同求解器(即 $\alpha$-rank、CE、FP 和 PRD)的近似近地策略,并且可以推广到未见过的博弈中。
Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as"inexact solvers", or"solvers"for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $\alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $\alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.