{"title":"Model-based DDPG for motor control","authors":"Haibo Shi, Yaoru Sun, Guangyuan Li","doi":"10.1109/PIC.2017.8359558","DOIUrl":null,"url":null,"abstract":"The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.","PeriodicalId":370588,"journal":{"name":"2017 International Conference on Progress in Informatics and Computing (PIC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC.2017.8359558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The deep deterministic policy gradient (DDPG) is a recently developed reinforcement learning method that could learn the control policy with a deterministic representation. The policy learning directly follows the gradient of the action-value function with respect to the actions. Similarly, the DDPG provides the gradient of the action-value function to the state readily. This mechanism allows the incorporation of the model information to improve the original DDPG. In this study, a model-based DDPG as an improvement to the original DDPG was implemented. An additional deep network was embedded into the framework of the conventional DDPG, based on which the gradient of the model dynamics for the maximization of the action-value is also exploited to learn the control policy. The model-based DDPG showed a relative advantage over the original DDPG through an experiment of simulated arm reaching movement control.