基于轮盘赌算法和模拟退火策略的联合强化学习方法

2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) Pub Date : 2020-11-18 DOI:10.1109/iciibms50712.2020.9336206

Huang Jin-bo, Yang Rui-jun, Cheng Yan

{"title":"基于轮盘赌算法和模拟退火策略的联合强化学习方法","authors":"Huang Jin-bo, Yang Rui-jun, Cheng Yan","doi":"10.1109/iciibms50712.2020.9336206","DOIUrl":null,"url":null,"abstract":"A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.","PeriodicalId":243033,"journal":{"name":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy\",\"authors\":\"Huang Jin-bo, Yang Rui-jun, Cheng Yan\",\"doi\":\"10.1109/iciibms50712.2020.9336206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.\",\"PeriodicalId\":243033,\"journal\":{\"name\":\"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iciibms50712.2020.9336206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iciibms50712.2020.9336206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

针对传统强化学习算法收敛速度慢的问题，提出了一种基于联合模拟退火策略的Q和Sarsa组合算法。采用随机动态调整因子法平衡不同方法之间试错与效率的关系。采用轮盘赌算法对Q-learning进行改进，采用模拟退火算法替代Sarsa算法的选择策略，算法的整体收敛速度由退火速率控制。最后，对奖励函数的任务进行细分，设计基于动作分解的奖励函数。仿真结果表明，改进的路径规划方法可以有效地减少首次寻路的时间开销和碰撞次数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Joint Reinforcement Learning Method Based on Roulette Algorithm and Simulated Annealing Strategy

A combined Q and Sarsa algorithm based on united simulated annealing strategy proposed in order to solve the problems that the convergence speed of traditional reinforcement learning algorithm is slow. It could balance the relationship between trial-error and efficiency among different methods by a random dynamic adjustment factor method. The roulette algorithm is used to improve Q-learning, and the simulated annealing algorithm is used to replace the selection strategy of Sarsa algorithm, and the overall convergence rate of the algorithm is controlled by the annealing rate. Finally, the task of reward function is subdivided, and the reward function based on action decomposition is designed. The simulation results show that the improved path planning method can effectively reduce the time cost and the collision times of the first path finding.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 5th International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)

自引率

0.00%

发文量