{"title":"Development of Behavior based Robot manipulation using Actor-Critic architecture","authors":"Priya Shukla, Madhurjya Pegu, G. Nandi","doi":"10.1109/SPIN52536.2021.9566102","DOIUrl":null,"url":null,"abstract":"Developing behavior based robotic manipulation is a very challenging but necessary task to be solved, especially for humanoid and social robots. Fundamental robotic tasks such as grasping, pick and place, trajectory following are at present solved using conventional forward and inverse kinematics (IK), dynamics and trajectory planning, whereas we learn these complex tasks using past experiences. In this paper, we explore developing behavior based robotic manipulation using reinforcement learning, more specifically learning directly from experiences through interactions with the real world and without knowing the transition model of the environment. Here, we propose a multi agent paradigm to gather experiences from multiple environments in parallel along with a model for populating new generation of agents using Evolutionary Actor-Critic Algorithm (EACA). The agents are of actor-critic architecture and both of them comprises of general purpose neural networks. The actor-critic architecture enables the model to perform well both in high dimensional state space and high dimensional action space which is very crucial for all robotic applications. The proposed algorithm is benchmarked with respect to different multi agent paradigm but keeping the agent’s architecture same. Reinforcement learning, being highly data intensive, requires the use of the CPU and GPU cores to be done judiciously for sampling the environment as well as for training, the details of which have been described here. We have run rigorous experiments for learning joint trajectories on the open gym based KUKA arm manipulator, where our proposed method achieves learning stability within 300 episodes, as compared to the state-of-the-art actor-critic and Advanced Asynchronous Actor-Critic (A3C) algorithms both of which take more than 1000 episodes for learning the same task, showing the effectiveness of our proposed model.","PeriodicalId":343177,"journal":{"name":"2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Signal Processing and Integrated Networks (SPIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIN52536.2021.9566102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Developing behavior based robotic manipulation is a very challenging but necessary task to be solved, especially for humanoid and social robots. Fundamental robotic tasks such as grasping, pick and place, trajectory following are at present solved using conventional forward and inverse kinematics (IK), dynamics and trajectory planning, whereas we learn these complex tasks using past experiences. In this paper, we explore developing behavior based robotic manipulation using reinforcement learning, more specifically learning directly from experiences through interactions with the real world and without knowing the transition model of the environment. Here, we propose a multi agent paradigm to gather experiences from multiple environments in parallel along with a model for populating new generation of agents using Evolutionary Actor-Critic Algorithm (EACA). The agents are of actor-critic architecture and both of them comprises of general purpose neural networks. The actor-critic architecture enables the model to perform well both in high dimensional state space and high dimensional action space which is very crucial for all robotic applications. The proposed algorithm is benchmarked with respect to different multi agent paradigm but keeping the agent’s architecture same. Reinforcement learning, being highly data intensive, requires the use of the CPU and GPU cores to be done judiciously for sampling the environment as well as for training, the details of which have been described here. We have run rigorous experiments for learning joint trajectories on the open gym based KUKA arm manipulator, where our proposed method achieves learning stability within 300 episodes, as compared to the state-of-the-art actor-critic and Advanced Asynchronous Actor-Critic (A3C) algorithms both of which take more than 1000 episodes for learning the same task, showing the effectiveness of our proposed model.