{"title":"Combining Reward Shaping and Curriculum Learning for Training Agents with High Dimensional Continuous Action Spaces","authors":"Sooyoung Jang, Mikyong Han","doi":"10.1109/ICTC.2018.8539438","DOIUrl":null,"url":null,"abstract":"The needs for training agent with high dimensional continuous action spaces will increase as the robot hardware such as robotic arms and humanoid robots are becoming more and more sophisticated. However, it is difficult and time-consuming task. To tackle the problem, we combine reward shaping and curriculum learning. More specifically, the rewards are provided to the agent for every step it takes and the difficulty of the problem gradually increases depending on the agent learning. Both reward function and curriculum are designed to make the agent achieve its objective. The simulation results demonstrate that the proposed scheme outperforms the comparisons.","PeriodicalId":417962,"journal":{"name":"2018 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC.2018.8539438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The needs for training agent with high dimensional continuous action spaces will increase as the robot hardware such as robotic arms and humanoid robots are becoming more and more sophisticated. However, it is difficult and time-consuming task. To tackle the problem, we combine reward shaping and curriculum learning. More specifically, the rewards are provided to the agent for every step it takes and the difficulty of the problem gradually increases depending on the agent learning. Both reward function and curriculum are designed to make the agent achieve its objective. The simulation results demonstrate that the proposed scheme outperforms the comparisons.