Dynamic Programming vs Q-learning for Feedback Motion Planning of Manipulators

U. Yıldıran
{"title":"Dynamic Programming vs Q-learning for Feedback Motion Planning of Manipulators","authors":"U. Yıldıran","doi":"10.1109/HORA58378.2023.10155782","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) based methods have became popular for control and motion planning of robots, recently. Unlike sampling based motion planners, optimal policies computed by them provide feedback motion plans which eliminates the need for re-computing (optimal) trajectories when a robot starts from a different initial configuration each time. In related studies, an optimal policy (actor) and the associated value function (critic) are usually calculated preforming training in a simulation environment. During training, RL allows learning by interactions with the environment in a physically realistic manner. However, in a simulation system, it is possible to make physically unimplementable moves. Thus, instead of RL, one can make use of Dynamic Programming approaches such as Value Iteration for computing optimal policies, which does not require an exploration component and known to have better convergence properties. In addition, dimension of a value function is smaller than that of a Q-fuction, thereby lessening the severity of the curse of dimensionality. Motivated by these facts, the aim of this paper is to employ Value Iteration algorithm for motion planning of robot manipulators and elaborate its effectiveness compared to a popular RL method, Q-learning.","PeriodicalId":247679,"journal":{"name":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","volume":"33 5‐6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HORA58378.2023.10155782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement Learning (RL) based methods have became popular for control and motion planning of robots, recently. Unlike sampling based motion planners, optimal policies computed by them provide feedback motion plans which eliminates the need for re-computing (optimal) trajectories when a robot starts from a different initial configuration each time. In related studies, an optimal policy (actor) and the associated value function (critic) are usually calculated preforming training in a simulation environment. During training, RL allows learning by interactions with the environment in a physically realistic manner. However, in a simulation system, it is possible to make physically unimplementable moves. Thus, instead of RL, one can make use of Dynamic Programming approaches such as Value Iteration for computing optimal policies, which does not require an exploration component and known to have better convergence properties. In addition, dimension of a value function is smaller than that of a Q-fuction, thereby lessening the severity of the curse of dimensionality. Motivated by these facts, the aim of this paper is to employ Value Iteration algorithm for motion planning of robot manipulators and elaborate its effectiveness compared to a popular RL method, Q-learning.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
机械臂反馈运动规划的动态规划与q -学习
近年来,基于强化学习(RL)的方法在机器人控制和运动规划中越来越受欢迎。与基于采样的运动规划器不同,由它们计算的最优策略提供反馈运动计划,从而消除了每次机器人从不同初始配置开始时重新计算(最优)轨迹的需要。在相关研究中,通常在模拟环境中进行训练前计算最优策略(actor)和相关的价值函数(critic)。在训练期间,强化学习允许以物理现实的方式与环境相互作用来学习。然而,在模拟系统中,有可能做出物理上无法实现的移动。因此,可以使用动态规划方法,如值迭代来计算最优策略,而不是RL,它不需要探索组件,并且已知具有更好的收敛特性。此外,值函数的维数比q函数的维数小,从而减轻了维数诅咒的严重程度。基于这些事实,本文的目的是将值迭代算法用于机器人操作器的运动规划,并阐述其与流行的强化学习方法Q-learning相比的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Classification of Urban Sounds with PSO and WO Based Feature Selection Methods Modeling a system determining the fastest way to get from one point to another by public transport NNA and Activation Equation-Based Prediction of New COVID-19 Infections Plaka tanıma sistemleri ve hibrit bir sistem önerisi Color Image Encryption Using a Sine Variation of the Logistic Map for S-Box and Key Generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1