Multi-Task Multi-Agent Reinforcement Learning With Task-Entity Transformers and Value Decomposition Training

IF 6.4 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-11-21 DOI:10.1109/TASE.2024.3501580
Yuanheng Zhu;Shangjing Huang;Binbin Zuo;Dongbin Zhao;Changyin Sun
{"title":"Multi-Task Multi-Agent Reinforcement Learning With Task-Entity Transformers and Value Decomposition Training","authors":"Yuanheng Zhu;Shangjing Huang;Binbin Zuo;Dongbin Zhao;Changyin Sun","doi":"10.1109/TASE.2024.3501580","DOIUrl":null,"url":null,"abstract":"Multi-task multi-agent reinforcement learning aims to control multiple agents to perform well on multiple tasks. It encounters three core challenges: the varying number of agents and entities, the disparities in cooperative behaviors among different tasks, and the training imbalance caused by varying task difficulty levels. To address these issues, we propose a novel framework named Task-Entity Transformer Qmix (TETQmix), which employs pretrained language models for task encoding, utilizes proposed Task-Entity Transformer to handle observations across various tasks, and adjusts task learning weights to achieve balanced multi-task training. Task-Entity Transformer not only enables handling multi-task scenarios with varying numbers of agents and entities, but also leverages cross-attention modules to integrate observation and task embeddings, so that each agent can obtain individual values and decisions for multiple tasks. We then utilize a transformer-based mixer to monotonically combine the individual values, and train the whole network’s parameters using temporal-difference errors. To facilitate multi-task training, we define task regret as the difference between the current-stage return and the candidate best one, and adjust the learning weight of each task based on its task regret. Experiments are conducted on both simulated multi-particle environments and real-world multi-robot systems. Compared with existing baselines, our method not only is superior in multi-task learning efficiency, but also shows promising transfer ability on unseen tasks. Note to Practitioners—The flexibility of multi-agent systems makes them quite fit to multiple tasks. Compared to designing different decision models for different tasks, it is more convenient if one can use just one decision model to resolve multiple tasks. Besides, it can make the maximum utilization of trajectory data coming from similar tasks when the data are integrated for multi-task decision model training. Natural language provides a powerful tool to describe the task context and emphasize the similarities or differences among different tasks. Pretrained language models can encode the task context, based on which the decision model can adjust its output distribution for different tasks and even synthesize the decisions from existing and similar tasks to achieve promising zero-shot and few-shot transfer performance for unseen tasks. With our proposed TETQmix, practitioners are able to realize multi-task capability in multi-agent systems and increase the generalization in a variety of scenarios.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"9164-9177"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10759737/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-task multi-agent reinforcement learning aims to control multiple agents to perform well on multiple tasks. It encounters three core challenges: the varying number of agents and entities, the disparities in cooperative behaviors among different tasks, and the training imbalance caused by varying task difficulty levels. To address these issues, we propose a novel framework named Task-Entity Transformer Qmix (TETQmix), which employs pretrained language models for task encoding, utilizes proposed Task-Entity Transformer to handle observations across various tasks, and adjusts task learning weights to achieve balanced multi-task training. Task-Entity Transformer not only enables handling multi-task scenarios with varying numbers of agents and entities, but also leverages cross-attention modules to integrate observation and task embeddings, so that each agent can obtain individual values and decisions for multiple tasks. We then utilize a transformer-based mixer to monotonically combine the individual values, and train the whole network’s parameters using temporal-difference errors. To facilitate multi-task training, we define task regret as the difference between the current-stage return and the candidate best one, and adjust the learning weight of each task based on its task regret. Experiments are conducted on both simulated multi-particle environments and real-world multi-robot systems. Compared with existing baselines, our method not only is superior in multi-task learning efficiency, but also shows promising transfer ability on unseen tasks. Note to Practitioners—The flexibility of multi-agent systems makes them quite fit to multiple tasks. Compared to designing different decision models for different tasks, it is more convenient if one can use just one decision model to resolve multiple tasks. Besides, it can make the maximum utilization of trajectory data coming from similar tasks when the data are integrated for multi-task decision model training. Natural language provides a powerful tool to describe the task context and emphasize the similarities or differences among different tasks. Pretrained language models can encode the task context, based on which the decision model can adjust its output distribution for different tasks and even synthesize the decisions from existing and similar tasks to achieve promising zero-shot and few-shot transfer performance for unseen tasks. With our proposed TETQmix, practitioners are able to realize multi-task capability in multi-agent systems and increase the generalization in a variety of scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用任务实体转换器和价值分解训练进行多任务多代理强化学习
多任务多智能体强化学习旨在控制多个智能体在多个任务上表现良好。它面临三个核心挑战:agent和实体数量的变化,不同任务之间合作行为的差异,以及不同任务难度级别导致的训练不平衡。为了解决这些问题,我们提出了一个名为任务-实体转换器Qmix (TETQmix)的新框架,该框架采用预训练的语言模型进行任务编码,利用所提出的任务-实体转换器处理各种任务的观察结果,并调整任务学习权重以实现平衡的多任务训练。task - entity Transformer不仅可以处理具有不同数量的agent和实体的多任务场景,而且还利用交叉关注模块集成观察和任务嵌入,使每个agent都可以获得多个任务的单独值和决策。然后,我们利用基于变压器的混频器对单个值进行单调组合,并使用时间差误差训练整个网络的参数。为了便于多任务训练,我们将任务后悔定义为当前阶段回报与候选最佳回报的差值,并根据任务后悔调整每个任务的学习权重。在模拟的多粒子环境和真实的多机器人系统中进行了实验。与现有的基线相比,我们的方法不仅在多任务学习效率上具有优势,而且在未知任务上也表现出良好的迁移能力。从业人员注意:多智能体系统的灵活性使它们非常适合多任务。与针对不同的任务设计不同的决策模型相比,使用一个决策模型来解决多个任务更为方便。此外,将来自相似任务的轨迹数据整合在一起进行多任务决策模型训练时,可以最大限度地利用这些数据。自然语言为描述任务上下文和强调不同任务之间的异同提供了强大的工具。预训练的语言模型可以对任务上下文进行编码,决策模型可以根据上下文对不同任务的输出分布进行调整,甚至可以综合已有任务和类似任务的决策,从而在未见任务中获得良好的零次和少次迁移性能。利用我们提出的TETQmix,从业者能够在多智能体系统中实现多任务能力,并在各种场景中增加泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automation Science and Engineering
IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统
CiteScore
12.50
自引率
14.30%
发文量
404
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.
期刊最新文献
Sensorless Robotic External Force Estimation in Uncertain Interactive Environments: A Hybrid Adaptive-Robust Kalman Filtering Approach Prescribed-Time Critic-Only Consensus for Constrained Multiagent Systems Through ADP Hierarchical Distributed Optimal Safety-Critical Consensus of Multi-Robot Systems in Dynamic Environments HECTOR: Human-centric Hierarchical Coordination and Supervision of Robotic Fleets under Continual Temporal Tasks Finite-Time Flexible Performance-Based Formation Control for Wheeled Humanoid Robots with Dynamic Obstacle Avoidance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1