Purified Policy Space Response Oracles for Symmetric Zero-Sum Games

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE transactions on neural networks and learning systems Pub Date : 2024-12-12 DOI:10.1109/TNNLS.2024.3457509
Zhengdao Shao;Liansheng Zhuang;Yihong Huang;Houqiang Li;Shafei Wang
{"title":"Purified Policy Space Response Oracles for Symmetric Zero-Sum Games","authors":"Zhengdao Shao;Liansheng Zhuang;Yihong Huang;Houqiang Li;Shafei Wang","doi":"10.1109/TNNLS.2024.3457509","DOIUrl":null,"url":null,"abstract":"Policy space response oracles (PSRO) is a promising tool to find an approximate Nash equilibrium (NE) in a two-player zero-sum game. It solves the equilibrium by iteratively expanding a small-scale meta-game formed by a restricted strategy population consisting of historical approximate best responses of the meta-games. However, since these best responses have a strong correlation with each other, existing PSRO and its variants often have the slow diversity growth of the strategy population, and thus suffer from poor exploration efficiency and slow convergence rate. To address this problem, this article proposes Purified PSRO, which deliberately maintains a pure strategy population formed by pure strategy bases of approximate best responses. A novel module namely non-best response suppression (NBRS) is introduced to calculate a pure strategy base with better orthogonality to expand the strategy population at each epoch. In this way, Purified PSRO can quickly increase the diversity of the strategy population, thus greatly enhance the efficiency of exploration. Theoretically, we prove the convergence of Purified PSRO. Moreover, we introduce an early stop module to reduce computation cost, and give the upper bound of the exploitability when the algorithm stops early. Extensive experiments on random games of skill (RGoS) and real-world meta-games show that Purified PSRO can consistently outperform existing SOTA methods, sometimes with a large margin.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"11258-11270"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10795441/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Policy space response oracles (PSRO) is a promising tool to find an approximate Nash equilibrium (NE) in a two-player zero-sum game. It solves the equilibrium by iteratively expanding a small-scale meta-game formed by a restricted strategy population consisting of historical approximate best responses of the meta-games. However, since these best responses have a strong correlation with each other, existing PSRO and its variants often have the slow diversity growth of the strategy population, and thus suffer from poor exploration efficiency and slow convergence rate. To address this problem, this article proposes Purified PSRO, which deliberately maintains a pure strategy population formed by pure strategy bases of approximate best responses. A novel module namely non-best response suppression (NBRS) is introduced to calculate a pure strategy base with better orthogonality to expand the strategy population at each epoch. In this way, Purified PSRO can quickly increase the diversity of the strategy population, thus greatly enhance the efficiency of exploration. Theoretically, we prove the convergence of Purified PSRO. Moreover, we introduce an early stop module to reduce computation cost, and give the upper bound of the exploitability when the algorithm stops early. Extensive experiments on random games of skill (RGoS) and real-world meta-games show that Purified PSRO can consistently outperform existing SOTA methods, sometimes with a large margin.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对称零和博弈的净化策略空间响应预言
政策空间反应预言(PSRO)是一个很有前途的工具,可以在两方零和博弈中找到近似纳什均衡(NE)。它通过迭代扩展小规模元博弈来解决平衡问题,小规模元博弈是由由元博弈的历史近似最佳对策组成的有限策略群体形成的。然而,由于这些最佳响应之间具有较强的相关性,现有的PSRO及其变体往往具有策略种群多样性增长缓慢的特点,因而勘探效率较低,收敛速度较慢。为了解决这个问题,本文提出了纯化的PSRO,它故意保持一个由近似最佳对策的纯策略基础组成的纯策略群体。引入非最佳反应抑制(non-best response suppression, NBRS)模块,计算出具有较好正交性的纯策略基,从而扩展策略种群。这样,纯化后的PSRO可以快速增加策略种群的多样性,从而大大提高探索效率。从理论上证明了纯化PSRO的收敛性。此外,我们还引入了提前停止模块以降低计算成本,并给出了算法提前停止时可利用性的上界。对随机技能游戏(rgo)和现实世界元游戏的大量实验表明,纯化PSRO可以持续优于现有的SOTA方法,有时甚至有很大的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
期刊最新文献
Adaptive Prototype-Guided Personalized Propagation for Heterophilic Graphs With Missing Data. Causal Counterfactual Inference Network for Video Object State Changes in Open-World Scenarios. Attribute-Topology Cross-Frequency Aligned Graph Neural Networks for Homophilic and Heterophilic Graphs in Node Classification. When Optimal Transport Meets Photo-Realistic Image Dehazing With Unpaired Training. Multistage PCA Whitening: A Robust Method to Dimensionality Reduction in Image Retrieval.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1