Bo Wu, Xiaoya Yang, Chuxiong Sun, Rui Wang, Xiaohui Hu, Yan Hu
{"title":"通过注意沟通学习有效的价值函数分解","authors":"Bo Wu, Xiaoya Yang, Chuxiong Sun, Rui Wang, Xiaohui Hu, Yan Hu","doi":"10.1109/SMC42975.2020.9283355","DOIUrl":null,"url":null,"abstract":"How to achieve efficient cooperation among agents in partially observed environments remains an overarching problem in multi-agent reinforcement learning (MARL). Value function factorization learning is a promising way as it can efficiently address multi-agent credit assignment problem. However, existing value function factorization methods have been focusing on learning fully decentralized value functions, which are not effective for some complex tasks. To address this limitation, we propose a framework which enhances value function factorization by allowing communication during execution. Communication introduces extra information to help agents understand the complex environment and learn sophisticated factorization. Furthermore, the proposed mechanism of communication differs from existing methods since we additionally design a descriptive key along with the message. By the descriptive key, agents can dynamically measure the importance of different messages and achieve attentional communication. We evaluate our framework on a challenging set of StarCraft II micromanagement tasks, and show that it significantly outperforms existing value function factorization methods.","PeriodicalId":6718,"journal":{"name":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","volume":"46 1","pages":"629-634"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning Effective Value Function Factorization via Attentional Communication\",\"authors\":\"Bo Wu, Xiaoya Yang, Chuxiong Sun, Rui Wang, Xiaohui Hu, Yan Hu\",\"doi\":\"10.1109/SMC42975.2020.9283355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"How to achieve efficient cooperation among agents in partially observed environments remains an overarching problem in multi-agent reinforcement learning (MARL). Value function factorization learning is a promising way as it can efficiently address multi-agent credit assignment problem. However, existing value function factorization methods have been focusing on learning fully decentralized value functions, which are not effective for some complex tasks. To address this limitation, we propose a framework which enhances value function factorization by allowing communication during execution. Communication introduces extra information to help agents understand the complex environment and learn sophisticated factorization. Furthermore, the proposed mechanism of communication differs from existing methods since we additionally design a descriptive key along with the message. By the descriptive key, agents can dynamically measure the importance of different messages and achieve attentional communication. We evaluate our framework on a challenging set of StarCraft II micromanagement tasks, and show that it significantly outperforms existing value function factorization methods.\",\"PeriodicalId\":6718,\"journal\":{\"name\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"volume\":\"46 1\",\"pages\":\"629-634\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMC42975.2020.9283355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMC42975.2020.9283355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Effective Value Function Factorization via Attentional Communication
How to achieve efficient cooperation among agents in partially observed environments remains an overarching problem in multi-agent reinforcement learning (MARL). Value function factorization learning is a promising way as it can efficiently address multi-agent credit assignment problem. However, existing value function factorization methods have been focusing on learning fully decentralized value functions, which are not effective for some complex tasks. To address this limitation, we propose a framework which enhances value function factorization by allowing communication during execution. Communication introduces extra information to help agents understand the complex environment and learn sophisticated factorization. Furthermore, the proposed mechanism of communication differs from existing methods since we additionally design a descriptive key along with the message. By the descriptive key, agents can dynamically measure the importance of different messages and achieve attentional communication. We evaluate our framework on a challenging set of StarCraft II micromanagement tasks, and show that it significantly outperforms existing value function factorization methods.