复杂环境下基于深度强化学习的多智能体运动规划

2021 6th International Conference on Control and Robotics Engineering (ICCRE) Pub Date : 2021-04-16 DOI:10.1109/ICCRE51898.2021.9435656

Dingwei Wu, Kaifang Wan, Xiao-guang Gao, Zijian Hu

{"title":"复杂环境下基于深度强化学习的多智能体运动规划","authors":"Dingwei Wu, Kaifang Wan, Xiao-guang Gao, Zijian Hu","doi":"10.1109/ICCRE51898.2021.9435656","DOIUrl":null,"url":null,"abstract":"When agents in a multiagent system implement motion planning in complex and dynamic environments, model-based planning algorithms have poor adaptability, while intelligent algorithms, such as MADDPG, encounter difficulty in converging when training multiple agents, and the resulting control model has poor stability and robustness. To address the above challenges, this paper proposes a mixed experience multiagent deep deterministic policy gradient algorithm referred to as ME-MADDPG. The algorithm increases the high-quality experience obtained by artificial potential field method and uses dynamic probability to sample from different replay buffers. Simulation experiments have proven that compared to MADDPG, ME-MADDPG greatly improves convergence speed, convergence effect and stability and that ME-MADDPG can efficiently provide shorter and more convenient paths for multiagent systems.","PeriodicalId":382619,"journal":{"name":"2021 6th International Conference on Control and Robotics Engineering (ICCRE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Multiagent Motion Planning Based on Deep Reinforcement Learning in Complex Environments\",\"authors\":\"Dingwei Wu, Kaifang Wan, Xiao-guang Gao, Zijian Hu\",\"doi\":\"10.1109/ICCRE51898.2021.9435656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When agents in a multiagent system implement motion planning in complex and dynamic environments, model-based planning algorithms have poor adaptability, while intelligent algorithms, such as MADDPG, encounter difficulty in converging when training multiple agents, and the resulting control model has poor stability and robustness. To address the above challenges, this paper proposes a mixed experience multiagent deep deterministic policy gradient algorithm referred to as ME-MADDPG. The algorithm increases the high-quality experience obtained by artificial potential field method and uses dynamic probability to sample from different replay buffers. Simulation experiments have proven that compared to MADDPG, ME-MADDPG greatly improves convergence speed, convergence effect and stability and that ME-MADDPG can efficiently provide shorter and more convenient paths for multiagent systems.\",\"PeriodicalId\":382619,\"journal\":{\"name\":\"2021 6th International Conference on Control and Robotics Engineering (ICCRE)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Control and Robotics Engineering (ICCRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCRE51898.2021.9435656\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Control and Robotics Engineering (ICCRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCRE51898.2021.9435656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

多智能体系统中的智能体在复杂动态环境中进行运动规划时，基于模型的规划算法适应性较差，而智能算法(如MADDPG)在训练多智能体时难以收敛，得到的控制模型稳定性和鲁棒性较差。为了解决上述挑战，本文提出了一种混合体验多智能体深度确定性策略梯度算法，称为ME-MADDPG。该算法增加了人工势场法获得的高质量经验，并利用动态概率对不同的重放缓冲区进行采样。仿真实验证明，与MADDPG相比，ME-MADDPG大大提高了收敛速度、收敛效果和稳定性，ME-MADDPG可以有效地为多智能体系统提供更短、更方便的路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multiagent Motion Planning Based on Deep Reinforcement Learning in Complex Environments

When agents in a multiagent system implement motion planning in complex and dynamic environments, model-based planning algorithms have poor adaptability, while intelligent algorithms, such as MADDPG, encounter difficulty in converging when training multiple agents, and the resulting control model has poor stability and robustness. To address the above challenges, this paper proposes a mixed experience multiagent deep deterministic policy gradient algorithm referred to as ME-MADDPG. The algorithm increases the high-quality experience obtained by artificial potential field method and uses dynamic probability to sample from different replay buffers. Simulation experiments have proven that compared to MADDPG, ME-MADDPG greatly improves convergence speed, convergence effect and stability and that ME-MADDPG can efficiently provide shorter and more convenient paths for multiagent systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 6th International Conference on Control and Robotics Engineering (ICCRE)

自引率

0.00%

发文量