{"title":"基于轮盘赌算法和模拟退火策略的联合强化学习方法","authors":"Huang Jin-bo, Yang Rui-jun, Cheng Yan","doi":"10.1109/iciibms50712.2020.9336206","DOIUrl":null,"url":null,"abstract":"A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.","PeriodicalId":243033,"journal":{"name":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy\",\"authors\":\"Huang Jin-bo, Yang Rui-jun, Cheng Yan\",\"doi\":\"10.1109/iciibms50712.2020.9336206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.\",\"PeriodicalId\":243033,\"journal\":{\"name\":\"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iciibms50712.2020.9336206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iciibms50712.2020.9336206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy
A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.