Multi-Task Multi-Agent Reinforcement Learning With Task-Entity Transformers and Value Decomposition Training

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-11-21 DOI:10.1109/TASE.2024.3501580

Yuanheng Zhu;Shangjing Huang;Binbin Zuo;Dongbin Zhao;Changyin Sun

{"title":"Multi-Task Multi-Agent Reinforcement Learning With Task-Entity Transformers and Value Decomposition Training","authors":"Yuanheng Zhu;Shangjing Huang;Binbin Zuo;Dongbin Zhao;Changyin Sun","doi":"10.1109/TASE.2024.3501580","DOIUrl":null,"url":null,"abstract":"Multi-task multi-agent reinforcement learning aims to control multiple agents to perform well on multiple tasks. It encounters three core challenges: the varying number of agents and entities, the disparities in cooperative behaviors among different tasks, and the training imbalance caused by varying task difficulty levels. To address these issues, we propose a novel framework named Task-Entity Transformer Qmix (TETQmix), which employs pretrained language models for task encoding, utilizes proposed Task-Entity Transformer to handle observations across various tasks, and adjusts task learning weights to achieve balanced multi-task training. Task-Entity Transformer not only enables handling multi-task scenarios with varying numbers of agents and entities, but also leverages cross-attention modules to integrate observation and task embeddings, so that each agent can obtain individual values and decisions for multiple tasks. We then utilize a transformer-based mixer to monotonically combine the individual values, and train the whole network’s parameters using temporal-difference errors. To facilitate multi-task training, we define task regret as the difference between the current-stage return and the candidate best one, and adjust the learning weight of each task based on its task regret. Experiments are conducted on both simulated multi-particle environments and real-world multi-robot systems. Compared with existing baselines, our method not only is superior in multi-task learning efficiency, but also shows promising transfer ability on unseen tasks. Note to Practitioners—The flexibility of multi-agent systems makes them quite fit to multiple tasks. Compared to designing different decision models for different tasks, it is more convenient if one can use just one decision model to resolve multiple tasks. Besides, it can make the maximum utilization of trajectory data coming from similar tasks when the data are integrated for multi-task decision model training. Natural language provides a powerful tool to describe the task context and emphasize the similarities or differences among different tasks. Pretrained language models can encode the task context, based on which the decision model can adjust its output distribution for different tasks and even synthesize the decisions from existing and similar tasks to achieve promising zero-shot and few-shot transfer performance for unseen tasks. With our proposed TETQmix, practitioners are able to realize multi-task capability in multi-agent systems and increase the generalization in a variety of scenarios.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"9164-9177"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10759737/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-task multi-agent reinforcement learning aims to control multiple agents to perform well on multiple tasks. It encounters three core challenges: the varying number of agents and entities, the disparities in cooperative behaviors among different tasks, and the training imbalance caused by varying task difficulty levels. To address these issues, we propose a novel framework named Task-Entity Transformer Qmix (TETQmix), which employs pretrained language models for task encoding, utilizes proposed Task-Entity Transformer to handle observations across various tasks, and adjusts task learning weights to achieve balanced multi-task training. Task-Entity Transformer not only enables handling multi-task scenarios with varying numbers of agents and entities, but also leverages cross-attention modules to integrate observation and task embeddings, so that each agent can obtain individual values and decisions for multiple tasks. We then utilize a transformer-based mixer to monotonically combine the individual values, and train the whole network’s parameters using temporal-difference errors. To facilitate multi-task training, we define task regret as the difference between the current-stage return and the candidate best one, and adjust the learning weight of each task based on its task regret. Experiments are conducted on both simulated multi-particle environments and real-world multi-robot systems. Compared with existing baselines, our method not only is superior in multi-task learning efficiency, but also shows promising transfer ability on unseen tasks. Note to Practitioners—The flexibility of multi-agent systems makes them quite fit to multiple tasks. Compared to designing different decision models for different tasks, it is more convenient if one can use just one decision model to resolve multiple tasks. Besides, it can make the maximum utilization of trajectory data coming from similar tasks when the data are integrated for multi-task decision model training. Natural language provides a powerful tool to describe the task context and emphasize the similarities or differences among different tasks. Pretrained language models can encode the task context, based on which the decision model can adjust its output distribution for different tasks and even synthesize the decisions from existing and similar tasks to achieve promising zero-shot and few-shot transfer performance for unseen tasks. With our proposed TETQmix, practitioners are able to realize multi-task capability in multi-agent systems and increase the generalization in a variety of scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用任务实体转换器和价值分解训练进行多任务多代理强化学习

多任务多智能体强化学习旨在控制多个智能体在多个任务上表现良好。它面临三个核心挑战：agent和实体数量的变化，不同任务之间合作行为的差异，以及不同任务难度级别导致的训练不平衡。为了解决这些问题，我们提出了一个名为任务-实体转换器Qmix （TETQmix）的新框架，该框架采用预训练的语言模型进行任务编码，利用所提出的任务-实体转换器处理各种任务的观察结果，并调整任务学习权重以实现平衡的多任务训练。task - entity Transformer不仅可以处理具有不同数量的agent和实体的多任务场景，而且还利用交叉关注模块集成观察和任务嵌入，使每个agent都可以获得多个任务的单独值和决策。然后，我们利用基于变压器的混频器对单个值进行单调组合，并使用时间差误差训练整个网络的参数。为了便于多任务训练，我们将任务后悔定义为当前阶段回报与候选最佳回报的差值，并根据任务后悔调整每个任务的学习权重。在模拟的多粒子环境和真实的多机器人系统中进行了实验。与现有的基线相比，我们的方法不仅在多任务学习效率上具有优势，而且在未知任务上也表现出良好的迁移能力。从业人员注意：多智能体系统的灵活性使它们非常适合多任务。与针对不同的任务设计不同的决策模型相比，使用一个决策模型来解决多个任务更为方便。此外，将来自相似任务的轨迹数据整合在一起进行多任务决策模型训练时，可以最大限度地利用这些数据。自然语言为描述任务上下文和强调不同任务之间的异同提供了强大的工具。预训练的语言模型可以对任务上下文进行编码，决策模型可以根据上下文对不同任务的输出分布进行调整，甚至可以综合已有任务和类似任务的决策，从而在未见任务中获得良好的零次和少次迁移性能。利用我们提出的TETQmix，从业者能够在多智能体系统中实现多任务能力，并在各种场景中增加泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.