带知识转移的深度多任务多代理强化学习

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Games Pub Date : 2023-09-19 DOI:10.1109/TG.2023.3316697

Yuxiang Mai;Yifan Zang;Qiyue Yin;Wancheng Ni;Kaiqi Huang

{"title":"带知识转移的深度多任务多代理强化学习","authors":"Yuxiang Mai;Yifan Zang;Qiyue Yin;Wancheng Ni;Kaiqi Huang","doi":"10.1109/TG.2023.3316697","DOIUrl":null,"url":null,"abstract":"Despite the potential of multiagent reinforcement learning (MARL) in addressing numerous complex tasks, training a single team of MARL agents to handle multiple diverse team tasks remains a challenge. In this article, we introduce a novel Multitask method based on Knowledge Transfer in cooperative MARL (MKT-MARL). By learning from task-specific teachers, our approach empowers a single team of agents to attain expert-level performance in multiple tasks. MKT-MARL utilizes a knowledge distillation algorithm specifically designed for the multiagent architecture, which rapidly learns a team control policy incorporating common coordinated knowledge from the experience of task-specific teachers. In addition, we enhance this training with teacher annealing, gradually shifting the model's learning from distillation toward environmental rewards. This enhancement helps the multitask model surpass its single-task teachers. We extensively evaluate our algorithm using two commonly-used benchmarks: \n<italic>StarCraft II</i>\n micromanagement and multiagent particle environment. The experimental results demonstrate that our algorithm outperforms both the single-task teachers and a jointly trained team of agents. Extensive ablation experiments illustrate the effectiveness of the supervised knowledge transfer and the teacher annealing strategy.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 3","pages":"566-576"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Multitask Multiagent Reinforcement Learning With Knowledge Transfer\",\"authors\":\"Yuxiang Mai;Yifan Zang;Qiyue Yin;Wancheng Ni;Kaiqi Huang\",\"doi\":\"10.1109/TG.2023.3316697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite the potential of multiagent reinforcement learning (MARL) in addressing numerous complex tasks, training a single team of MARL agents to handle multiple diverse team tasks remains a challenge. In this article, we introduce a novel Multitask method based on Knowledge Transfer in cooperative MARL (MKT-MARL). By learning from task-specific teachers, our approach empowers a single team of agents to attain expert-level performance in multiple tasks. MKT-MARL utilizes a knowledge distillation algorithm specifically designed for the multiagent architecture, which rapidly learns a team control policy incorporating common coordinated knowledge from the experience of task-specific teachers. In addition, we enhance this training with teacher annealing, gradually shifting the model's learning from distillation toward environmental rewards. This enhancement helps the multitask model surpass its single-task teachers. We extensively evaluate our algorithm using two commonly-used benchmarks: \\n<italic>StarCraft II</i>\\n micromanagement and multiagent particle environment. The experimental results demonstrate that our algorithm outperforms both the single-task teachers and a jointly trained team of agents. Extensive ablation experiments illustrate the effectiveness of the supervised knowledge transfer and the teacher annealing strategy.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"16 3\",\"pages\":\"566-576\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10255234/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10255234/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

尽管多代理强化学习（MARL）在处理众多复杂任务方面潜力巨大，但训练一个由 MARL 代理组成的团队来处理多个不同的团队任务仍然是一项挑战。在这篇文章中，我们介绍了一种基于合作式 MARL（MKT-MARL）知识转移的新型多任务方法。通过向特定任务的教师学习，我们的方法可使单个代理团队在多个任务中达到专家级表现。MKT-MARL 利用专为多代理架构设计的知识提炼算法，快速学习团队控制策略，其中包含从特定任务教师的经验中获得的共同协调知识。此外，我们还通过教师退火来加强这种训练，逐渐将模型的学习从蒸馏转向环境奖励。这种增强有助于多任务模型超越其单一任务教师。我们使用两个常用基准对我们的算法进行了广泛评估：星际争霸 II》微观管理和多代理粒子环境。实验结果表明，我们的算法优于单任务教师和联合训练的代理团队。广泛的消融实验说明了监督知识转移和教师退火策略的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deep Multitask Multiagent Reinforcement Learning With Knowledge Transfer

Despite the potential of multiagent reinforcement learning (MARL) in addressing numerous complex tasks, training a single team of MARL agents to handle multiple diverse team tasks remains a challenge. In this article, we introduce a novel Multitask method based on Knowledge Transfer in cooperative MARL (MKT-MARL). By learning from task-specific teachers, our approach empowers a single team of agents to attain expert-level performance in multiple tasks. MKT-MARL utilizes a knowledge distillation algorithm specifically designed for the multiagent architecture, which rapidly learns a team control policy incorporating common coordinated knowledge from the experience of task-specific teachers. In addition, we enhance this training with teacher annealing, gradually shifting the model's learning from distillation toward environmental rewards. This enhancement helps the multitask model surpass its single-task teachers. We extensively evaluate our algorithm using two commonly-used benchmarks: StarCraft II micromanagement and multiagent particle environment. The experimental results demonstrate that our algorithm outperforms both the single-task teachers and a jointly trained team of agents. Extensive ablation experiments illustrate the effectiveness of the supervised knowledge transfer and the teacher annealing strategy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助