Shuhuan Wen, Jianhua Chen, Shen Wang, Hong Zhang, Xueheng Hu
{"title":"Path Planning of Humanoid Arm Based on Deep Deterministic Policy Gradient","authors":"Shuhuan Wen, Jianhua Chen, Shen Wang, Hong Zhang, Xueheng Hu","doi":"10.1109/ROBIO.2018.8665248","DOIUrl":null,"url":null,"abstract":"The robot arm with multiple degrees of freedom and working in a 3D space needs to avoid obstacles during the grasping process by its end effector. Path planning to avoid obstacles is very important for accomplishing a grasping task. This paper proposes a new obstacle avoidance algorithm, based on an existing deep reinforcement learning framework called deep deterministic policy gradient (DDPG). Specifically, we propose to use DDPG to plan the trajectory of a robot arm to realize obstacle avoidance. The rewards are designed to overcome the difficulty in convergence of multiple rewards, especially when the rewards are antagonistic with respect to each other. Obstacle avoidance of the robot arm using DDPG is achieved by self-learning, and the convergence problem caused by the high dimension state input and multiple return values is solved. The simulation model of an arm of the Nao robot is built based on the MuJoCo simulation environment. The simulation demonstrates that the proposed algorithm successfully allows the robot arm to avoid obstacles.","PeriodicalId":417415,"journal":{"name":"2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO.2018.8665248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
The robot arm with multiple degrees of freedom and working in a 3D space needs to avoid obstacles during the grasping process by its end effector. Path planning to avoid obstacles is very important for accomplishing a grasping task. This paper proposes a new obstacle avoidance algorithm, based on an existing deep reinforcement learning framework called deep deterministic policy gradient (DDPG). Specifically, we propose to use DDPG to plan the trajectory of a robot arm to realize obstacle avoidance. The rewards are designed to overcome the difficulty in convergence of multiple rewards, especially when the rewards are antagonistic with respect to each other. Obstacle avoidance of the robot arm using DDPG is achieved by self-learning, and the convergence problem caused by the high dimension state input and multiple return values is solved. The simulation model of an arm of the Nao robot is built based on the MuJoCo simulation environment. The simulation demonstrates that the proposed algorithm successfully allows the robot arm to avoid obstacles.