{"title":"动态环境下基于rsmdp的鲁棒q学习最优路径规划","authors":"Yunfei Zhang, Weilin Li, C. D. Silva","doi":"10.2316/Journal.206.2016.4.206-4255","DOIUrl":null,"url":null,"abstract":"This paper presents arobust Q-learning method for path planningin a dynamic environment. The method consists of three steps: first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; second a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP and stored as a graph whose nodes correspond to a collision-free world state for the robot; and third, an onlineQ-learning method with dynamic stepsize, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use ofregime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able torapidly and successfully converge to the correct path.","PeriodicalId":206015,"journal":{"name":"Int. J. Robotics Autom.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Rsmdp-Based robust Q-Learning for Optimal Path Planning in a Dynamic Environment\",\"authors\":\"Yunfei Zhang, Weilin Li, C. D. Silva\",\"doi\":\"10.2316/Journal.206.2016.4.206-4255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents arobust Q-learning method for path planningin a dynamic environment. The method consists of three steps: first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; second a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP and stored as a graph whose nodes correspond to a collision-free world state for the robot; and third, an onlineQ-learning method with dynamic stepsize, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use ofregime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able torapidly and successfully converge to the correct path.\",\"PeriodicalId\":206015,\"journal\":{\"name\":\"Int. J. Robotics Autom.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Robotics Autom.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2316/Journal.206.2016.4.206-4255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Robotics Autom.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2316/Journal.206.2016.4.206-4255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rsmdp-Based robust Q-Learning for Optimal Path Planning in a Dynamic Environment
This paper presents arobust Q-learning method for path planningin a dynamic environment. The method consists of three steps: first, a regime-switching Markov decision process (RSMDP) is formed to present the dynamic environment; second a probabilistic roadmap (PRM) is constructed, integrated with the RSMDP and stored as a graph whose nodes correspond to a collision-free world state for the robot; and third, an onlineQ-learning method with dynamic stepsize, which facilitates robust convergence of the Q-value iteration, is integrated with the PRM to determine an optimal path for reaching the goal. In this manner, the robot is able to use past experience for improving its performance in avoiding not only static obstacles but also moving obstacles, without knowing the nature of the obstacle motion. The use ofregime switching in the avoidance of obstacles with unknown motion is particularly innovative. The developed approach is applied to a homecare robot in computer simulation. The results show that the online path planner with Q-learning is able torapidly and successfully converge to the correct path.