Rsmdp-Based robust Q-Learning for Optimal Path Planning in a Dynamic Environment

Yunfei Zhang, Weilin Li, C. D. Silva
{"title":"Rsmdp-Based robust Q-Learning for Optimal Path Planning in a Dynamic Environment","authors":"Yunfei Zhang, Weilin Li, C. D. Silva","doi":"10.2316/Journal.206.2016.4.206-4255","DOIUrl":null,"url":null,"abstract":"This paper presents arobust Q-learning method for path planningin a dynamic environment. The method consists of three steps: first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; second a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP and stored as a graph whose nodes correspond to a collision-free world state for the robot; and third, an onlineQ-learning method with dynamic stepsize, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use ofregime switching in the avoidance of obstacles with unknown motion is particularly innovative.  The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able torapidly and successfully converge to the correct path.","PeriodicalId":206015,"journal":{"name":"Int. J. Robotics Autom.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Robotics Autom.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2316/Journal.206.2016.4.206-4255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

This paper presents arobust Q-learning method for path planningin a dynamic environment. The method consists of three steps: first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; second a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP and stored as a graph whose nodes correspond to a collision-free world state for the robot; and third, an onlineQ-learning method with dynamic stepsize, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use ofregime switching in the avoidance of obstacles with unknown motion is particularly innovative.  The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able torapidly and successfully converge to the correct path.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
动态环境下基于rsmdp的鲁棒q学习最优路径规划
本文提出了一种用于动态环境下路径规划的强q学习方法。该方法分为三个步骤:首先,建立一个状态切换马尔可夫决策过程(RSMDP)来表示动态环境;其次,构建概率路线图(PRM),将其与RSMDP相结合并存储为图,其节点对应机器人的无碰撞世界状态;第三,将一种具有动态步长的在线q -学习方法与PRM相结合,以确定达到目标的最优路径,该方法有利于q值迭代的鲁棒收敛。通过这种方式,机器人能够利用过去的经验来提高其避开静态障碍物和移动障碍物的性能,而无需知道障碍物运动的性质。在避开未知运动障碍时使用状态切换尤其具有创新性。将该方法应用于家庭护理机器人的计算机仿真。结果表明,基于q学习的在线路径规划器能够快速、成功地收敛到正确的路径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On solving the kinematics and Controlling of Origami Box-shaped robot, 405-415. Si Consensus of Multi-Agent Systems using Back-tracking and History following Algorithms Stabilizing control Algorithm for nonholonomic wheeled Mobile robots using adaptive integral sliding mode A velocity compensation Visual servo method for oculomotor control of bionic eyes On-Line trajectory Generation considering kinematic motion Constraints for robot manipulators
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1