{"title":"Extension of Improved Penalty Avoiding Rational Policy Making algorithm to tile coding environment for keepaway tasks","authors":"Takuji Watanabe, K. Miyazaki, Hiroaki Kobayashi","doi":"10.1109/SICE.2008.4654997","DOIUrl":null,"url":null,"abstract":"We focus on potential capability of a profit sharing method (PS) in non-Markov multi-agent environments. It is shown that PS has some rationality in non-Markov environments and is also effective in multi-agent environments. However, conventional PS uses only a reward to learn suitable rules. On the other hand. ldquopenalty avoiding rational policy making algorithm (PARP)rdquo is based on PS and uses not only a reward but also penalties. PARP is improved to save memories and to cope with uncertainties, which is known as ldquoimproved penalty avoiding rational policy making algorithm (improved PARP).rdquo There is another critical problem we must cope with when we apply PS based methods to real environments; we need a huge amount of state information and most of states take continuous values. One solution for this problem is to approximate the states with a function approximation method, e.g. tile coding. In this paper, first, we extend improved penalty avoiding rational policy making algorithm to tile coding environments. Then, we compare the extended method with conventional methods to show the effectiveness through an application to a keepaway task in a soccer game.","PeriodicalId":152347,"journal":{"name":"2008 SICE Annual Conference","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 SICE Annual Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SICE.2008.4654997","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We focus on potential capability of a profit sharing method (PS) in non-Markov multi-agent environments. It is shown that PS has some rationality in non-Markov environments and is also effective in multi-agent environments. However, conventional PS uses only a reward to learn suitable rules. On the other hand. ldquopenalty avoiding rational policy making algorithm (PARP)rdquo is based on PS and uses not only a reward but also penalties. PARP is improved to save memories and to cope with uncertainties, which is known as ldquoimproved penalty avoiding rational policy making algorithm (improved PARP).rdquo There is another critical problem we must cope with when we apply PS based methods to real environments; we need a huge amount of state information and most of states take continuous values. One solution for this problem is to approximate the states with a function approximation method, e.g. tile coding. In this paper, first, we extend improved penalty avoiding rational policy making algorithm to tile coding environments. Then, we compare the extended method with conventional methods to show the effectiveness through an application to a keepaway task in a soccer game.