{"title":"求解多智能体随机博弈纳什均衡的价值函数与后悔最小化算法","authors":"Luping Liu, Wensheng Jia","doi":"10.2991/ijcis.d.210520.001","DOIUrl":null,"url":null,"abstract":"In this paper, we study the value function with regret minimization algorithm for solving the Nash equilibrium of multi-agent stochastic game (MASG). To begin with, the idea of regret minimization is introduced to the value function, and the value functionwith regretminimization algorithm is designed. Furthermore, we analyze the effect of discount factor to the expected payoff. Finally, the single-agent stochastic game and spatial prisoner’s dilemma (SDP) are investigated in order to support the theoretical results. The simulation results show that when the temptation parameter is small, the cooperation strategy is dominant; when the temptation parameter is large, the defection strategy is dominant. Therefore, we improve the level of cooperation between agents by setting appropriate temptation parameters.","PeriodicalId":13602,"journal":{"name":"Int. J. Comput. Intell. Syst.","volume":"25 1","pages":"1633-1641"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The Value Function with Regret Minimization Algorithm for Solving the Nash Equilibrium of Multi-Agent Stochastic Game\",\"authors\":\"Luping Liu, Wensheng Jia\",\"doi\":\"10.2991/ijcis.d.210520.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the value function with regret minimization algorithm for solving the Nash equilibrium of multi-agent stochastic game (MASG). To begin with, the idea of regret minimization is introduced to the value function, and the value functionwith regretminimization algorithm is designed. Furthermore, we analyze the effect of discount factor to the expected payoff. Finally, the single-agent stochastic game and spatial prisoner’s dilemma (SDP) are investigated in order to support the theoretical results. The simulation results show that when the temptation parameter is small, the cooperation strategy is dominant; when the temptation parameter is large, the defection strategy is dominant. Therefore, we improve the level of cooperation between agents by setting appropriate temptation parameters.\",\"PeriodicalId\":13602,\"journal\":{\"name\":\"Int. J. Comput. Intell. Syst.\",\"volume\":\"25 1\",\"pages\":\"1633-1641\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Intell. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2991/ijcis.d.210520.001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Intell. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/ijcis.d.210520.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Value Function with Regret Minimization Algorithm for Solving the Nash Equilibrium of Multi-Agent Stochastic Game
In this paper, we study the value function with regret minimization algorithm for solving the Nash equilibrium of multi-agent stochastic game (MASG). To begin with, the idea of regret minimization is introduced to the value function, and the value functionwith regretminimization algorithm is designed. Furthermore, we analyze the effect of discount factor to the expected payoff. Finally, the single-agent stochastic game and spatial prisoner’s dilemma (SDP) are investigated in order to support the theoretical results. The simulation results show that when the temptation parameter is small, the cooperation strategy is dominant; when the temptation parameter is large, the defection strategy is dominant. Therefore, we improve the level of cooperation between agents by setting appropriate temptation parameters.