{"title":"Multi-agent Proximal Policy Optimization via Non-fixed Value Clipping","authors":"Chiqiang Liu, Dazi Li","doi":"10.1109/DDCLS58216.2023.10167264","DOIUrl":null,"url":null,"abstract":"With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performance. However, the increase in the number of agents in multi-agent cooperation tasks leads to overfitting problems and suboptimal policies due to the fixed clip range that limits the step size of updates. In this paper, MAPPO via Non-fixed Value Clipping (NVC-MAPPO) algorithm is proposed based on MAPPO, and Gaussian noise is introduced in the value function and the clipping function, respectively, and rewriting the clipping function into a form called non-fixed value clipping function. In the end, experiments are conducted on StarCraftII Multi-Agent Challenge (SMAC) to verify that the algorithm can effectively prevent the step size from changing too much while enhancing the exploration ability of the agents, which has improved the performance compared with MAPPO.","PeriodicalId":415532,"journal":{"name":"2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DDCLS58216.2023.10167264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performance. However, the increase in the number of agents in multi-agent cooperation tasks leads to overfitting problems and suboptimal policies due to the fixed clip range that limits the step size of updates. In this paper, MAPPO via Non-fixed Value Clipping (NVC-MAPPO) algorithm is proposed based on MAPPO, and Gaussian noise is introduced in the value function and the clipping function, respectively, and rewriting the clipping function into a form called non-fixed value clipping function. In the end, experiments are conducted on StarCraftII Multi-Agent Challenge (SMAC) to verify that the algorithm can effectively prevent the step size from changing too much while enhancing the exploration ability of the agents, which has improved the performance compared with MAPPO.