Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han
{"title":"Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space","authors":"Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han","doi":"arxiv-2408.07395","DOIUrl":null,"url":null,"abstract":"In a multi-agent system (MAS), action semantics indicates the different\ninfluences of agents' actions toward other entities, and can be used to divide\nagents into groups in a physically heterogeneous MAS. Previous multi-agent\nreinforcement learning (MARL) algorithms apply global parameter-sharing across\ndifferent types of heterogeneous agents without careful discrimination of\ndifferent action semantics. This common implementation decreases the\ncooperation and coordination between agents in complex situations. However,\nfully independent agent parameters dramatically increase the computational cost\nand training difficulty. In order to benefit from the usage of different action\nsemantics while also maintaining a proper parameter-sharing structure, we\nintroduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is\nthe union set of all agent actions with different semantics. All agents first\ncalculate their unified representation in the UAS, and then generate their\nheterogeneous action policies using different available-action-masks. To\nfurther improve the training of extra UAS parameters, we introduce a\nCross-Group Inverse (CGI) loss to predict other groups' agent policies with the\ntrajectory information. As a universal method for solving the physically\nheterogeneous MARL problem, we implement the UAS adding to both value-based and\npolicy-based MARL algorithms, and propose two practical algorithms: U-QMIX and\nU-MAPPO. Experimental results in the SMAC environment prove the effectiveness\nof both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In a multi-agent system (MAS), action semantics indicates the different
influences of agents' actions toward other entities, and can be used to divide
agents into groups in a physically heterogeneous MAS. Previous multi-agent
reinforcement learning (MARL) algorithms apply global parameter-sharing across
different types of heterogeneous agents without careful discrimination of
different action semantics. This common implementation decreases the
cooperation and coordination between agents in complex situations. However,
fully independent agent parameters dramatically increase the computational cost
and training difficulty. In order to benefit from the usage of different action
semantics while also maintaining a proper parameter-sharing structure, we
introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is
the union set of all agent actions with different semantics. All agents first
calculate their unified representation in the UAS, and then generate their
heterogeneous action policies using different available-action-masks. To
further improve the training of extra UAS parameters, we introduce a
Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the
trajectory information. As a universal method for solving the physically
heterogeneous MARL problem, we implement the UAS adding to both value-based and
policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and
U-MAPPO. Experimental results in the SMAC environment prove the effectiveness
of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.