基于模型的DDPG电机控制

Haibo Shi, Yaoru Sun, Guangyuan Li
{"title":"基于模型的DDPG电机控制","authors":"Haibo Shi, Yaoru Sun, Guangyuan Li","doi":"10.1109/PIC.2017.8359558","DOIUrl":null,"url":null,"abstract":"The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Model-based DDPG for motor control\",\"authors\":\"Haibo Shi, Yaoru Sun, Guangyuan Li\",\"doi\":\"10.1109/PIC.2017.8359558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.\",\"PeriodicalId\":370588,\"journal\":{\"name\":\"2017 International Conference on Progress in Informatics and Computing (PIC)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Progress in Informatics and Computing (PIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PIC.2017.8359558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC.2017.8359558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

深度确定性策略梯度(deep deterministic policy gradient, DDPG)是近年来发展起来的一种强化学习方法,它可以学习具有确定性表示的控制策略。策略学习直接遵循动作-价值函数相对于动作的梯度。类似地,DDPG很容易地提供动作值函数到状态的梯度。该机制允许合并模型信息以改进原始DDPG。在本研究中,基于模型的DDPG作为原始DDPG的改进实现。在传统DDPG的框架中嵌入了一个额外的深度网络,在此基础上利用模型动力学的梯度使动作值最大化来学习控制策略。通过模拟手臂伸展运动控制实验,显示了基于模型的DDPG相对于原始DDPG的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Model-based DDPG for motor control
The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluation method and decision support of network education based on association rules ACER: An adaptive context-aware ensemble regression model for airfare price prediction An improved constraint model for team tactical position selection in games Trust your wallet: A new online wallet architecture for Bitcoin An approach based on decision tree for analysis of behavior with combined cycle power plant
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1