基于图中心多智能体强化学习的USV群端到端控制

2021 21st International Conference on Control, Automation and Systems (ICCAS) Pub Date : 2021-10-12 DOI:10.23919/ICCAS52745.2021.9649839

Kanghoon Lee, Kyuree Ahn, Jinkyoo Park

{"title":"基于图中心多智能体强化学习的USV群端到端控制","authors":"Kanghoon Lee, Kyuree Ahn, Jinkyoo Park","doi":"10.23919/ICCAS52745.2021.9649839","DOIUrl":null,"url":null,"abstract":"The Unmanned Surface Vehicles (USVs), which operate without a person at the surface, are used in various naval defense missions. Various missions can be conducted efficiently when a swarm of USVs are operated at the same time. However, it is challenging to establish a decentralised control strategy for all USVs. In addition, the strategy must consider various external factors, such as the ocean topography and the number of enemy forces. These difficulties necessitate a scalable and transferable decision-making module. This study proposes an algorithm to derive the decentralised and cooperative control strategy for the USV swarm using graph centric multi-agent reinforcement learning (MARL). The model first expresses the mission situation using a graph considering the various sensor ranges. Each USV agent encodes observed information into localized embedding and then derives coordinated action through communication with the surrounding agent. To derive a cooperative policy, we trained each agent's policy to maximize the team reward. Using the modified prey-predator environment of OpenAI gym, we have analyzed the effect of each component of the proposed model (state embedding, communication, and team reward). The ablation study shows that the proposed model could derive a scalable and transferable control policy of USVs, consistently achieving the highest win ratio.","PeriodicalId":411064,"journal":{"name":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"End-to-End control of USV swarm using graph centric Multi-Agent Reinforcement Learning\",\"authors\":\"Kanghoon Lee, Kyuree Ahn, Jinkyoo Park\",\"doi\":\"10.23919/ICCAS52745.2021.9649839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Unmanned Surface Vehicles (USVs), which operate without a person at the surface, are used in various naval defense missions. Various missions can be conducted efficiently when a swarm of USVs are operated at the same time. However, it is challenging to establish a decentralised control strategy for all USVs. In addition, the strategy must consider various external factors, such as the ocean topography and the number of enemy forces. These difficulties necessitate a scalable and transferable decision-making module. This study proposes an algorithm to derive the decentralised and cooperative control strategy for the USV swarm using graph centric multi-agent reinforcement learning (MARL). The model first expresses the mission situation using a graph considering the various sensor ranges. Each USV agent encodes observed information into localized embedding and then derives coordinated action through communication with the surrounding agent. To derive a cooperative policy, we trained each agent's policy to maximize the team reward. Using the modified prey-predator environment of OpenAI gym, we have analyzed the effect of each component of the proposed model (state embedding, communication, and team reward). The ablation study shows that the proposed model could derive a scalable and transferable control policy of USVs, consistently achieving the highest win ratio.\",\"PeriodicalId\":411064,\"journal\":{\"name\":\"2021 21st International Conference on Control, Automation and Systems (ICCAS)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 21st International Conference on Control, Automation and Systems (ICCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/ICCAS52745.2021.9649839\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 21st International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS52745.2021.9649839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

无人水面航行器(usv)在水面上无人操作，用于各种海军防御任务。当一群无人潜航器同时操作时，可以有效地执行各种任务。然而，为所有无人潜航器建立一个分散的控制策略是具有挑战性的。此外，战略必须考虑各种外部因素，如海洋地形和敌人的数量。这些困难需要一个可扩展和可转移的决策模块。本文提出了一种基于以图为中心的多智能体强化学习(MARL)的USV群分散协同控制策略。该模型首先用考虑不同传感器距离的图来表示任务情况。每个USV代理将观察到的信息编码成局部嵌入，然后通过与周围代理的通信派生出协调行动。为了得到合作策略，我们训练每个代理的策略以最大化团队奖励。利用改进的OpenAI gym的捕食环境，我们分析了所提出模型的各个组成部分(状态嵌入、通信和团队奖励)的效果。烧蚀研究表明，所提出的模型可以推导出可扩展和可转移的usv控制策略，始终如一地实现最高胜率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

End-to-End control of USV swarm using graph centric Multi-Agent Reinforcement Learning

The Unmanned Surface Vehicles (USVs), which operate without a person at the surface, are used in various naval defense missions. Various missions can be conducted efficiently when a swarm of USVs are operated at the same time. However, it is challenging to establish a decentralised control strategy for all USVs. In addition, the strategy must consider various external factors, such as the ocean topography and the number of enemy forces. These difficulties necessitate a scalable and transferable decision-making module. This study proposes an algorithm to derive the decentralised and cooperative control strategy for the USV swarm using graph centric multi-agent reinforcement learning (MARL). The model first expresses the mission situation using a graph considering the various sensor ranges. Each USV agent encodes observed information into localized embedding and then derives coordinated action through communication with the surrounding agent. To derive a cooperative policy, we trained each agent's policy to maximize the team reward. Using the modified prey-predator environment of OpenAI gym, we have analyzed the effect of each component of the proposed model (state embedding, communication, and team reward). The ablation study shows that the proposed model could derive a scalable and transferable control policy of USVs, consistently achieving the highest win ratio.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 21st International Conference on Control, Automation and Systems (ICCAS)

自引率

0.00%

发文量