A hybrid P2P and master-slave cooperative distributed multi-agent reinforcement learning technique with asynchronously triggered exploratory trials and clutter-index-based selected sub-goals
{"title":"A hybrid P2P and master-slave cooperative distributed multi-agent reinforcement learning technique with asynchronously triggered exploratory trials and clutter-index-based selected sub-goals","authors":"D. Megherbi, Minsuk Kim","doi":"10.1109/CIVEMSA.2016.7524249","DOIUrl":null,"url":null,"abstract":"In many large infrastructures, such as military battlefields, transportation and maritime systems spanning hundreds of miles at a time, collaborative multi-agent based monitoring is important. Agent Reinforcement Learning (RL), in general, becomes more challenging in a dynamic complex cluttered environment for autonomous path planning, where agents could be moving randomly to reach their respective goals. In our previous work we presented a hybrid master-slave and peer-to-peer system architecture, where each distributed agent knows only of a given master node, is only concerned with its assigned work load, has a limited knowledge of the environment and can, collaboratively with other agents, share learned information of the environment over a communication network. In this paper we extend our previous work and focus on (a) the study of the performance of said system and the effect of the agents' random walks on the overall system agent learning speed, when each of the distributed agents, after the random walk phase, starts its exploratory trials independently of the other agents, asynchronously, and immediately after it finishes its first exploratory trial towards a sub-goal or after its random walk phase, without waiting for the slowest agent to finish its first random walk or its first exploratory phase toward a sub-goal. (b) the effect on the agent learning speed, of using an environment-clutter-index to select agent sub-goals with the aim of reducing the agent initial random walk steps and (c) the effect of agent sharing/or not sharing environment information on the agent learning speed in such scenarios.","PeriodicalId":244122,"journal":{"name":"2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIVEMSA.2016.7524249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In many large infrastructures, such as military battlefields, transportation and maritime systems spanning hundreds of miles at a time, collaborative multi-agent based monitoring is important. Agent Reinforcement Learning (RL), in general, becomes more challenging in a dynamic complex cluttered environment for autonomous path planning, where agents could be moving randomly to reach their respective goals. In our previous work we presented a hybrid master-slave and peer-to-peer system architecture, where each distributed agent knows only of a given master node, is only concerned with its assigned work load, has a limited knowledge of the environment and can, collaboratively with other agents, share learned information of the environment over a communication network. In this paper we extend our previous work and focus on (a) the study of the performance of said system and the effect of the agents' random walks on the overall system agent learning speed, when each of the distributed agents, after the random walk phase, starts its exploratory trials independently of the other agents, asynchronously, and immediately after it finishes its first exploratory trial towards a sub-goal or after its random walk phase, without waiting for the slowest agent to finish its first random walk or its first exploratory phase toward a sub-goal. (b) the effect on the agent learning speed, of using an environment-clutter-index to select agent sub-goals with the aim of reducing the agent initial random walk steps and (c) the effect of agent sharing/or not sharing environment information on the agent learning speed in such scenarios.