基于模型的DDPG电机控制

2017 International Conference on Progress in Informatics and Computing (PIC) Pub Date : 2017-12-01 DOI:10.1109/PIC.2017.8359558

Haibo Shi, Yaoru Sun, Guangyuan Li

{"title":"基于模型的DDPG电机控制","authors":"Haibo Shi, Yaoru Sun, Guangyuan Li","doi":"10.1109/PIC.2017.8359558","DOIUrl":null,"url":null,"abstract":"The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Model-based DDPG for motor control\",\"authors\":\"Haibo Shi, Yaoru Sun, Guangyuan Li\",\"doi\":\"10.1109/PIC.2017.8359558\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.\",\"PeriodicalId\":370588,\"journal\":{\"name\":\"2017 International Conference on Progress in Informatics and Computing (PIC)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Progress in Informatics and Computing (PIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PIC.2017.8359558\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC.2017.8359558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

深度确定性策略梯度(deep deterministic policy gradient, DDPG)是近年来发展起来的一种强化学习方法，它可以学习具有确定性表示的控制策略。策略学习直接遵循动作-价值函数相对于动作的梯度。类似地，DDPG很容易地提供动作值函数到状态的梯度。该机制允许合并模型信息以改进原始DDPG。在本研究中，基于模型的DDPG作为原始DDPG的改进实现。在传统DDPG的框架中嵌入了一个额外的深度网络，在此基础上利用模型动力学的梯度使动作值最大化来学习控制策略。通过模拟手臂伸展运动控制实验，显示了基于模型的DDPG相对于原始DDPG的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Model-based DDPG for motor control

The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Progress in Informatics and Computing (PIC)

自引率

0.00%

发文量