{"title":"Guided Cost Learning for Lunar Lander Environment Using Human Demonstrated Expert Trajectories","authors":"Deepak S. Dharrao, S. Gite, Rahee Walambe","doi":"10.1109/AICAPS57044.2023.10074283","DOIUrl":null,"url":null,"abstract":"Inverse Reinforcement Learning is a subset of Imitation learning, where the goal is to generate a reward function that captures an expert’s behavior using a set of demonstrations by the expert. Guided Cost Learning (GCL) is a recent approach to finding a neural network reward function. In this paper the GCL algorithm is explored and applied to the Lunar Lander environment of the OpenAI gym. We generated our own set of expert demonstrations and implemented the GCL algorithm. We successfully demonstrate that Guided Cost Learning can generate a reward that completely encapsulates desired behavior depicted in the expert demonstrations, even for high dimensional state space environments such as the lunar lander environment. Reward and policy evaluations between the actual reward function and the GCL generated rewards function are compared and the results are presented.","PeriodicalId":146698,"journal":{"name":"2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Advances in Intelligent Computing and Applications (AICAPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAPS57044.2023.10074283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Inverse Reinforcement Learning is a subset of Imitation learning, where the goal is to generate a reward function that captures an expert’s behavior using a set of demonstrations by the expert. Guided Cost Learning (GCL) is a recent approach to finding a neural network reward function. In this paper the GCL algorithm is explored and applied to the Lunar Lander environment of the OpenAI gym. We generated our own set of expert demonstrations and implemented the GCL algorithm. We successfully demonstrate that Guided Cost Learning can generate a reward that completely encapsulates desired behavior depicted in the expert demonstrations, even for high dimensional state space environments such as the lunar lander environment. Reward and policy evaluations between the actual reward function and the GCL generated rewards function are compared and the results are presented.