Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards

2019 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2019-12-01 DOI:10.1109/SSCI44817.2019.9002777

Shubham Pateria, Budhitama Subagdja, A. Tan

{"title":"Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards","authors":"Shubham Pateria, Budhitama Subagdja, A. Tan","doi":"10.1109/SSCI44817.2019.9002777","DOIUrl":null,"url":null,"abstract":"In the complex multi-agent tasks, various agents must cooperate to distribute relevant subtasks among each other to achieve joint task objectives. An agent’s choice of the relevant subtask changes over time with the changes in the task environment state. Multi-agent Hierarchical Reinforcement Learning (MAHRL) provides an approach for learning to select the subtasks in response to the environment states, by using the joint task rewards to train various agents. When the joint task involves complex inter-agent dependencies, only a subset of agents might be capable of reaching the rewarding task states while other agents take precursory or intermediate roles. The delayed task reward might not be sufficient in such tasks to learn the coordinating policies for various agents. In this paper, we introduce a novel approach of MAHRL called Inter-Subtask Empowerment based Multi-agent Options (ISEMO) in which an Inter-Subtask Empowerment Reward (ISER) is given to an agent which enables the precondition(s) of other agents’ subtasks. ISER is given in addition to the domain task reward in order to improve the inter-agent coordination. ISEMO also incorporates options model that can learn parameterized subtask termination functions and relax the limitations posed by hand-crafted termination conditions. Experiments in a spatial Search and Rescue domain show that ISEMO can learn the subtask selection policies of various agents grounded in the inter-dependencies among the agents, as well as learn the subtask termination conditions, and perform better than the standard MAHRL technique.","PeriodicalId":6729,"journal":{"name":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"669 1","pages":"86-93"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI44817.2019.9002777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In the complex multi-agent tasks, various agents must cooperate to distribute relevant subtasks among each other to achieve joint task objectives. An agent’s choice of the relevant subtask changes over time with the changes in the task environment state. Multi-agent Hierarchical Reinforcement Learning (MAHRL) provides an approach for learning to select the subtasks in response to the environment states, by using the joint task rewards to train various agents. When the joint task involves complex inter-agent dependencies, only a subset of agents might be capable of reaching the rewarding task states while other agents take precursory or intermediate roles. The delayed task reward might not be sufficient in such tasks to learn the coordinating policies for various agents. In this paper, we introduce a novel approach of MAHRL called Inter-Subtask Empowerment based Multi-agent Options (ISEMO) in which an Inter-Subtask Empowerment Reward (ISER) is given to an agent which enables the precondition(s) of other agents’ subtasks. ISER is given in addition to the domain task reward in order to improve the inter-agent coordination. ISEMO also incorporates options model that can learn parameterized subtask termination functions and relax the limitations posed by hand-crafted termination conditions. Experiments in a spatial Search and Rescue domain show that ISEMO can learn the subtask selection policies of various agents grounded in the inter-dependencies among the agents, as well as learn the subtask termination conditions, and perform better than the standard MAHRL technique.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于子任务间授权奖励的空间域任务多智能体强化学习

在复杂的多智能体任务中，各个智能体必须相互协作，将相关的子任务分配到彼此之间，以实现共同的任务目标。代理对相关子任务的选择随着任务环境状态的变化而变化。多智能体分层强化学习(MAHRL)提供了一种学习方法，通过使用联合任务奖励来训练各种智能体，以响应环境状态选择子任务。当联合任务涉及复杂的智能体间依赖关系时，可能只有一小部分智能体能够达到奖励任务状态，而其他智能体则扮演前置或中间角色。在这些任务中，延迟的任务奖励可能不足以学习各种代理的协调策略。在本文中，我们引入了一种新的MAHRL方法，称为基于子任务间授权的多代理选项(ISEMO)，其中子任务间授权奖励(ISER)给予一个代理，使其他代理的子任务成为先决条件。为了提高智能体间的协调能力，在领域任务奖励的基础上增加了ISER。ISEMO还集成了选项模型，可以学习参数化的子任务终止函数，并放宽了手工制作的终止条件所带来的限制。在空间搜索与救援领域的实验表明，ISEMO可以根据agent之间的相互依赖关系学习各种agent的子任务选择策略，并学习子任务的终止条件，性能优于标准的MAHRL技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量

期刊最新文献

Planning for millions of NPCs in Real-Time Improving Diversity in Concept Drift Ensembles Self-Organizing Transformations for Automatic Feature Engineering Corrosion-like Defect Severity Estimation in Pipelines Using Convolutional Neural Networks Heuristic Hybridization for CaRSP, a multilevel decision problem