Zhengdao Shao;Liansheng Zhuang;Yihong Huang;Houqiang Li;Shafei Wang
{"title":"Purified Policy Space Response Oracles for Symmetric Zero-Sum Games","authors":"Zhengdao Shao;Liansheng Zhuang;Yihong Huang;Houqiang Li;Shafei Wang","doi":"10.1109/TNNLS.2024.3457509","DOIUrl":null,"url":null,"abstract":"Policy space response oracles (PSRO) is a promising tool to find an approximate Nash equilibrium (NE) in a two-player zero-sum game. It solves the equilibrium by iteratively expanding a small-scale meta-game formed by a restricted strategy population consisting of historical approximate best responses of the meta-games. However, since these best responses have a strong correlation with each other, existing PSRO and its variants often have the slow diversity growth of the strategy population, and thus suffer from poor exploration efficiency and slow convergence rate. To address this problem, this article proposes Purified PSRO, which deliberately maintains a pure strategy population formed by pure strategy bases of approximate best responses. A novel module namely non-best response suppression (NBRS) is introduced to calculate a pure strategy base with better orthogonality to expand the strategy population at each epoch. In this way, Purified PSRO can quickly increase the diversity of the strategy population, thus greatly enhance the efficiency of exploration. Theoretically, we prove the convergence of Purified PSRO. Moreover, we introduce an early stop module to reduce computation cost, and give the upper bound of the exploitability when the algorithm stops early. Extensive experiments on random games of skill (RGoS) and real-world meta-games show that Purified PSRO can consistently outperform existing SOTA methods, sometimes with a large margin.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"11258-11270"},"PeriodicalIF":8.9000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10795441/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Policy space response oracles (PSRO) is a promising tool to find an approximate Nash equilibrium (NE) in a two-player zero-sum game. It solves the equilibrium by iteratively expanding a small-scale meta-game formed by a restricted strategy population consisting of historical approximate best responses of the meta-games. However, since these best responses have a strong correlation with each other, existing PSRO and its variants often have the slow diversity growth of the strategy population, and thus suffer from poor exploration efficiency and slow convergence rate. To address this problem, this article proposes Purified PSRO, which deliberately maintains a pure strategy population formed by pure strategy bases of approximate best responses. A novel module namely non-best response suppression (NBRS) is introduced to calculate a pure strategy base with better orthogonality to expand the strategy population at each epoch. In this way, Purified PSRO can quickly increase the diversity of the strategy population, thus greatly enhance the efficiency of exploration. Theoretically, we prove the convergence of Purified PSRO. Moreover, we introduce an early stop module to reduce computation cost, and give the upper bound of the exploitability when the algorithm stops early. Extensive experiments on random games of skill (RGoS) and real-world meta-games show that Purified PSRO can consistently outperform existing SOTA methods, sometimes with a large margin.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.