{"title":"Reinforcement Learning for Cart Pole Inverted Pendulum System","authors":"A. Surriani, O. Wahyunggoro, A. Cahyadi","doi":"10.1109/IEACon51066.2021.9654440","DOIUrl":null,"url":null,"abstract":"Recently, reinforcement learning considered to be the chosen method to solve many problems. One of the challenging problems is controlling dynamic behaviour systems. This paper used policy gradient to balance cart pole inverted pendulum. The purpose of this paper is to balance the pole upright with the movement of the cart. The paper employed two main policy gradient-based algorithms. The results show that PG using baseline has faster episodes than reinforce PG in the training process, reinforce PG algorithm got higher accumulative reward value than PG using baseline.","PeriodicalId":397039,"journal":{"name":"2021 IEEE Industrial Electronics and Applications Conference (IEACon)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Industrial Electronics and Applications Conference (IEACon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEACon51066.2021.9654440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Recently, reinforcement learning considered to be the chosen method to solve many problems. One of the challenging problems is controlling dynamic behaviour systems. This paper used policy gradient to balance cart pole inverted pendulum. The purpose of this paper is to balance the pole upright with the movement of the cart. The paper employed two main policy gradient-based algorithms. The results show that PG using baseline has faster episodes than reinforce PG in the training process, reinforce PG algorithm got higher accumulative reward value than PG using baseline.