基于局部训练全局工作Agent的复杂环境下安全、长度和平滑轨迹优化

Qianyi Zhang, Dingye Yang, Lei Zhou, Zhengxi Hu, Jingtai Liu
{"title":"基于局部训练全局工作Agent的复杂环境下安全、长度和平滑轨迹优化","authors":"Qianyi Zhang, Dingye Yang, Lei Zhou, Zhengxi Hu, Jingtai Liu","doi":"10.1109/RCAR54675.2022.9872237","DOIUrl":null,"url":null,"abstract":"Focused on the balance among safety, length, and smoothness, this paper proposes a novel model to train an agent with deep reinforcement learning to optimize trajectory in complex environments. Inspired by the human habit that first finds the shortest trajectory and then slightly optimizes safety and smoothness, State is initialized as a radical trajectory combined with local obstacle distribution. Action adjusts dangerous waypoints jointly. Reward penalizes length increase based on local smoothness change. Episode is early terminated to divide the whole problem into smaller ones, while reward assembles them back with a large amount of training data. This allows the agent to be trained locally and work globally to accelerate convergence. Performances in various scenarios demonstrate our method’s ability to balance safety, length, and smoothness. With the Markov property of the problem and our newly discovered mathematical property of B-spline, it adjusts waypoints under sub-grid map and can be generalized stably in various maps with dense obstacles.","PeriodicalId":304963,"journal":{"name":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Trajectory Optimization on Safety, Length and Smoothness in Complex Environments with A Locally Trained and Globally Working Agent\",\"authors\":\"Qianyi Zhang, Dingye Yang, Lei Zhou, Zhengxi Hu, Jingtai Liu\",\"doi\":\"10.1109/RCAR54675.2022.9872237\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Focused on the balance among safety, length, and smoothness, this paper proposes a novel model to train an agent with deep reinforcement learning to optimize trajectory in complex environments. Inspired by the human habit that first finds the shortest trajectory and then slightly optimizes safety and smoothness, State is initialized as a radical trajectory combined with local obstacle distribution. Action adjusts dangerous waypoints jointly. Reward penalizes length increase based on local smoothness change. Episode is early terminated to divide the whole problem into smaller ones, while reward assembles them back with a large amount of training data. This allows the agent to be trained locally and work globally to accelerate convergence. Performances in various scenarios demonstrate our method’s ability to balance safety, length, and smoothness. With the Markov property of the problem and our newly discovered mathematical property of B-spline, it adjusts waypoints under sub-grid map and can be generalized stably in various maps with dense obstacles.\",\"PeriodicalId\":304963,\"journal\":{\"name\":\"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RCAR54675.2022.9872237\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Real-time Computing and Robotics (RCAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RCAR54675.2022.9872237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

针对安全、长度和平滑之间的平衡,提出了一种基于深度强化学习的智能体训练模型,用于复杂环境下的轨迹优化。受人类习惯的启发,首先寻找最短的轨迹,然后稍微优化安全性和平滑性,将State初始化为结合局部障碍物分布的激进轨迹。共同调整危险航路点。奖励惩罚基于局部平滑变化的长度增加。插曲被提前终止,将整个问题分成更小的问题,而奖励则用大量的训练数据将它们重新组合起来。这使得代理可以在本地训练并在全球范围内工作以加速收敛。在各种场景中的性能证明了我们的方法能够平衡安全性、长度和平滑性。利用问题的马尔可夫性质和我们新发现的b样条的数学性质,它可以在子网格地图下调整路径点,并且可以稳定地推广到各种具有密集障碍物的地图中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Trajectory Optimization on Safety, Length and Smoothness in Complex Environments with A Locally Trained and Globally Working Agent
Focused on the balance among safety, length, and smoothness, this paper proposes a novel model to train an agent with deep reinforcement learning to optimize trajectory in complex environments. Inspired by the human habit that first finds the shortest trajectory and then slightly optimizes safety and smoothness, State is initialized as a radical trajectory combined with local obstacle distribution. Action adjusts dangerous waypoints jointly. Reward penalizes length increase based on local smoothness change. Episode is early terminated to divide the whole problem into smaller ones, while reward assembles them back with a large amount of training data. This allows the agent to be trained locally and work globally to accelerate convergence. Performances in various scenarios demonstrate our method’s ability to balance safety, length, and smoothness. With the Markov property of the problem and our newly discovered mathematical property of B-spline, it adjusts waypoints under sub-grid map and can be generalized stably in various maps with dense obstacles.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Depth Recognition of Hard Inclusions in Tissue Phantoms for Robotic Palpation Design of a Miniaturized Magnetic Actuation System for Motion Control of Micro/Nano Swimming Robots Energy Shaping Based Nonlinear Anti-Swing Controller for Double-Pendulum Rotary Crane with Distributed-Mass Beams RCAR 2022 Cover Page Design and Implementation of Robot Middleware Service Integration Framework Based on DDS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1