A hybrid P2P and master-slave cooperative distributed multi-agent reinforcement learning technique with asynchronously triggered exploratory trials and clutter-index-based selected sub-goals

2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA) Pub Date : 2016-06-27 DOI:10.1109/CIVEMSA.2016.7524249

D. Megherbi, Minsuk Kim

{"title":"A hybrid P2P and master-slave cooperative distributed multi-agent reinforcement learning technique with asynchronously triggered exploratory trials and clutter-index-based selected sub-goals","authors":"D. Megherbi, Minsuk Kim","doi":"10.1109/CIVEMSA.2016.7524249","DOIUrl":null,"url":null,"abstract":"In many large infrastructures, such as military battlefields, transportation and maritime systems spanning hundreds of miles at a time, collaborative multi-agent based monitoring is important. Agent Reinforcement Learning (RL), in general, becomes more challenging in a dynamic complex cluttered environment for autonomous path planning, where agents could be moving randomly to reach their respective goals. In our previous work we presented a hybrid master-slave and peer-to-peer system architecture, where each distributed agent knows only of a given master node, is only concerned with its assigned work load, has a limited knowledge of the environment and can, collaboratively with other agents, share learned information of the environment over a communication network. In this paper we extend our previous work and focus on (a) the study of the performance of said system and the effect of the agents' random walks on the overall system agent learning speed, when each of the distributed agents, after the random walk phase, starts its exploratory trials independently of the other agents, asynchronously, and immediately after it finishes its first exploratory trial towards a sub-goal or after its random walk phase, without waiting for the slowest agent to finish its first random walk or its first exploratory phase toward a sub-goal. (b) the effect on the agent learning speed, of using an environment-clutter-index to select agent sub-goals with the aim of reducing the agent initial random walk steps and (c) the effect of agent sharing/or not sharing environment information on the agent learning speed in such scenarios.","PeriodicalId":244122,"journal":{"name":"2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIVEMSA.2016.7524249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In many large infrastructures, such as military battlefields, transportation and maritime systems spanning hundreds of miles at a time, collaborative multi-agent based monitoring is important. Agent Reinforcement Learning (RL), in general, becomes more challenging in a dynamic complex cluttered environment for autonomous path planning, where agents could be moving randomly to reach their respective goals. In our previous work we presented a hybrid master-slave and peer-to-peer system architecture, where each distributed agent knows only of a given master node, is only concerned with its assigned work load, has a limited knowledge of the environment and can, collaboratively with other agents, share learned information of the environment over a communication network. In this paper we extend our previous work and focus on (a) the study of the performance of said system and the effect of the agents' random walks on the overall system agent learning speed, when each of the distributed agents, after the random walk phase, starts its exploratory trials independently of the other agents, asynchronously, and immediately after it finishes its first exploratory trial towards a sub-goal or after its random walk phase, without waiting for the slowest agent to finish its first random walk or its first exploratory phase toward a sub-goal. (b) the effect on the agent learning speed, of using an environment-clutter-index to select agent sub-goals with the aim of reducing the agent initial random walk steps and (c) the effect of agent sharing/or not sharing environment information on the agent learning speed in such scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种异步触发探索性试验和基于杂波索引选择子目标的混合型P2P和主从合作分布式多智能体强化学习技术

在许多大型基础设施中，例如军事战场、运输和海上系统，一次跨越数百英里，基于多代理的协作监控非常重要。一般来说，智能体强化学习(RL)在动态复杂混乱的环境中变得更具挑战性，因为智能体可以随机移动以达到各自的目标。在我们之前的工作中，我们提出了一个主从和点对点的混合系统架构，其中每个分布式代理只知道一个给定的主节点，只关心其分配的工作负载，对环境的了解有限，并且可以与其他代理协作，通过通信网络共享环境的学习信息。在本文中，我们扩展了之前的工作，并专注于(a)研究所述系统的性能以及智能体随机行走对整个系统智能体学习速度的影响，当每个分布式智能体在随机行走阶段之后，独立于其他智能体异步地开始其探索性试验，并且在完成其针对子目标的第一次探索性试验之后或在其随机行走阶段之后，无需等待最慢的智能体完成第一次随机漫步或第一次探索阶段。(b)使用环境-杂乱指数来选择代理子目标以减少代理初始随机行走步数对代理学习速度的影响;(c)在这种情况下，代理共享/不共享环境信息对代理学习速度的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)

自引率

0.00%

发文量