Although the reinforcement learning and evolutionary algorithm show good results in board evaluation optimization, the hybrid of both approaches is rarely addressed in the literature. In this paper, the evolutionary algorithm is boosted using resources from the reinforcement learning. 1) The initialization of initial population using solution optimized by temporal difference learning 2) Exploitation of domain knowledge extracted from reinforcement learning. Experiments on Othello game strategies show that the proposed methods can effectively search the solution space and improve the performance
{"title":"Hybrid of Evolution and Reinforcement Learning for Othello Players","authors":"Kyung-Joong Kim, He-Seong Choi, Sung-Bae Cho","doi":"10.1109/CIG.2007.368099","DOIUrl":"https://doi.org/10.1109/CIG.2007.368099","url":null,"abstract":"Although the reinforcement learning and evolutionary algorithm show good results in board evaluation optimization, the hybrid of both approaches is rarely addressed in the literature. In this paper, the evolutionary algorithm is boosted using resources from the reinforcement learning. 1) The initialization of initial population using solution optimized by temporal difference learning 2) Exploitation of domain knowledge extracted from reinforcement learning. Experiments on Othello game strategies show that the proposed methods can effectively search the solution space and improve the performance","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130078672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The majority of work on artificial neural networks (ANNs) playing the game of Go focus on network architectures and training regimes to improve the quality of the neural player. A less investigated problem is the board representation conveying the information on the current state of the game to the network. Common approaches suggest a straight-forward encoding by assigning each point on the board to a single (or more) input neurons. However, these basic representations do not capture elementary structural relationships between stones (and points) being essential to the game. We compare three different board representations for self-learning ANNs on a 5 times 5 board employing temporal difference learning (TDL) with two types of move selection (during training). The strength of the trained networks is evaluated in games against three computer players of different quality. A tournament of the best neural players, addition of alpha-beta search, and a commented game of a neural player against the best computer player further explore the potential of the neural players and its respective board representations
{"title":"Board Representations for Neural Go Players Learning by Temporal Difference","authors":"H. A. Mayer","doi":"10.1109/CIG.2007.368096","DOIUrl":"https://doi.org/10.1109/CIG.2007.368096","url":null,"abstract":"The majority of work on artificial neural networks (ANNs) playing the game of Go focus on network architectures and training regimes to improve the quality of the neural player. A less investigated problem is the board representation conveying the information on the current state of the game to the network. Common approaches suggest a straight-forward encoding by assigning each point on the board to a single (or more) input neurons. However, these basic representations do not capture elementary structural relationships between stones (and points) being essential to the game. We compare three different board representations for self-learning ANNs on a 5 times 5 board employing temporal difference learning (TDL) with two types of move selection (during training). The strength of the trained networks is evaluated in games against three computer players of different quality. A tournament of the best neural players, addition of alpha-beta search, and a commented game of a neural player against the best computer player further explore the potential of the neural players and its respective board representations","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122655043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an artificial neural network with shared weights, trained to play the game of Othello by self-play with temporal difference learning (TDL). The network performs as well as the champion of the CEC 2006 Othello Evaluation Function Competition. The TDL-trained network contains only 67 unique weights compared to 2113 for the champion
{"title":"Temporal Difference Learning of an Othello Evaluation Function for a Small Neural Network with Shared Weights","authors":"E. Manning","doi":"10.1109/CIG.2007.368101","DOIUrl":"https://doi.org/10.1109/CIG.2007.368101","url":null,"abstract":"This paper presents an artificial neural network with shared weights, trained to play the game of Othello by self-play with temporal difference learning (TDL). The network performs as well as the champion of the CEC 2006 Othello Evaluation Function Competition. The TDL-trained network contains only 67 unique weights compared to 2113 for the champion","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121290473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Goore Game (GG) introduced by M. L. Tsetlin in 1973 has the fascinating property that it can be resolved in a completely distributed manner with no intercommunication between the players. The game has recently found applications in many domains, including the field of sensor networks and quality-of-service (QoS) routing. In actual implementations of the solution, the players are typically replaced by learning automata (LA). The problem with the existing reported approaches is that the accuracy of the solution achieved is intricately related to the number of players participating in the game -which, in turn, determines the resolution. In other words, an arbitrary accuracy can be obtained only if the game has an infinite number of players. In this paper, we show how we can attain an unbounded accuracy for the GG by utilizing no more than three stochastic learning machines, and by recursively pruning the solution space to guarantee that the retained domain contains the solution to the game with a probability as close to unity as desired. The paper also conjectures on how the solution can be applied to some of the application domains
由M. L. Tsetlin于1973年提出的Goore Game (GG)具有一个令人着迷的特性,即它可以在参与者之间没有相互通信的情况下以完全分布式的方式解决。该游戏最近在许多领域得到了应用,包括传感器网络和服务质量(QoS)路由领域。在解决方案的实际实现中,玩家通常被学习自动机(LA)所取代。现有报告方法的问题在于,解决方案的准确性与参与游戏的玩家数量有着复杂的关系,而玩家数量又决定了解决方案。换句话说,只有当游戏拥有无限数量的玩家时,才能获得任意精度。在本文中,我们展示了如何通过使用不超过三个随机学习机,并通过递归地修剪解空间来保证保留域以接近于期望的概率包含博弈的解,从而获得GG的无界精度。本文还推测了如何将该解决方案应用于某些应用领域
{"title":"Using Stochastic AI Techniques to Achieve Unbounded Resolution in Finite Player Goore Games and its Applications","authors":"B. Oommen, Ole-Christoffer Granmo, A. Pedersen","doi":"10.1109/CIG.2007.368093","DOIUrl":"https://doi.org/10.1109/CIG.2007.368093","url":null,"abstract":"The Goore Game (GG) introduced by M. L. Tsetlin in 1973 has the fascinating property that it can be resolved in a completely distributed manner with no intercommunication between the players. The game has recently found applications in many domains, including the field of sensor networks and quality-of-service (QoS) routing. In actual implementations of the solution, the players are typically replaced by learning automata (LA). The problem with the existing reported approaches is that the accuracy of the solution achieved is intricately related to the number of players participating in the game -which, in turn, determines the resolution. In other words, an arbitrary accuracy can be obtained only if the game has an infinite number of players. In this paper, we show how we can attain an unbounded accuracy for the GG by utilizing no more than three stochastic learning machines, and by recursively pruning the solution space to guarantee that the retained domain contains the solution to the game with a probability as close to unity as desired. The paper also conjectures on how the solution can be applied to some of the application domains","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116350491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The challenge of creating teams of agents, which evolve or learn, to solve complex problems is addressed in the combinatorially complex game of dots and boxes (strings and coins). Previous evolutionary reinforcement learning (ERL) systems approaching this task based on dynamic agent populations have shown some degree of success in game play, however are sensitive to conditions and suffer from unstable agent populations under difficult play and poor development against an easier opponent. A novel technique for preserving stability and allowing balance of specialised and generalised rules in an ERL system is presented, motivated by accessibility of concepts in human cognition, as opposed to natural selection through population survivability common to ERL systems. Reinforcement learning in dynamic teams of mutable agents enables play comparable to hand-crafted artificial players. Performance and stability of development is enhanced when a measure of the frequency of reinforcement is separated from the quality measure of rules
{"title":"Concept Accessibility as Basis for Evolutionary Reinforcement Learning of Dots and Boxes","authors":"Anthony Knittel, T. Bossomaier, A. Snyder","doi":"10.1109/CIG.2007.368090","DOIUrl":"https://doi.org/10.1109/CIG.2007.368090","url":null,"abstract":"The challenge of creating teams of agents, which evolve or learn, to solve complex problems is addressed in the combinatorially complex game of dots and boxes (strings and coins). Previous evolutionary reinforcement learning (ERL) systems approaching this task based on dynamic agent populations have shown some degree of success in game play, however are sensitive to conditions and suffer from unstable agent populations under difficult play and poor development against an easier opponent. A novel technique for preserving stability and allowing balance of specialised and generalised rules in an ERL system is presented, motivated by accessibility of concepts in human cognition, as opposed to natural selection through population survivability common to ERL systems. Reinforcement learning in dynamic teams of mutable agents enables play comparable to hand-crafted artificial players. Performance and stability of development is enhanced when a measure of the frequency of reinforcement is separated from the quality measure of rules","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133444072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the video-game environment has begun to change due to the explosive growth of the Internet. As a result, it makes the time for maintenance longer and the development cost increased. In addition, the life cycle of the game program shortens. To solve the above-mentioned problem, we have already proposed the event-driven hybrid learning classifier system and showed that the system is effective to improving the game winning rate and making the learning time shorten. This paper describes the investigation result of the effect in case we apply the reward allotment considered each role for classifier learning system. Concretely, we investigate the influence to each player's actions by changing the algorithm of the opponent and to team strategy by changing reward setting, and analyze them. As a result, we show that the influence of learning effects to each player's actions does not depend on the algorithm of opponent. And we also show that the reward allotment considered each role has possible to evolve the game strategy to improving the game winning rate
{"title":"Reward Allotment Considered Roles for Learning Classifier System For Soccer Video Games","authors":"Yosuke Akatsuka, Yuji Sato","doi":"10.1109/CIG.2007.368111","DOIUrl":"https://doi.org/10.1109/CIG.2007.368111","url":null,"abstract":"In recent years, the video-game environment has begun to change due to the explosive growth of the Internet. As a result, it makes the time for maintenance longer and the development cost increased. In addition, the life cycle of the game program shortens. To solve the above-mentioned problem, we have already proposed the event-driven hybrid learning classifier system and showed that the system is effective to improving the game winning rate and making the learning time shorten. This paper describes the investigation result of the effect in case we apply the reward allotment considered each role for classifier learning system. Concretely, we investigate the influence to each player's actions by changing the algorithm of the opponent and to team strategy by changing reward setting, and analyze them. As a result, we show that the influence of learning effects to each player's actions does not depend on the algorithm of opponent. And we also show that the reward allotment considered each role has possible to evolve the game strategy to improving the game winning rate","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132050473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we focus on game rule design by using two evolutionary computations. The first EC is a multi-objective evolutionary algorithm in order to generate various skilled players. By using acquired skilled players, i.e., Pareto individuals in MOEA, another EC (evolutionary programming) adjusts game rule parameters i.e, an appropriate point of each card in the COMMONS GAME
{"title":"Evolutionary Computations for Designing Game Rules of the COMMONS GAME","authors":"H. Handa, N. Baba","doi":"10.1109/CIG.2007.368117","DOIUrl":"https://doi.org/10.1109/CIG.2007.368117","url":null,"abstract":"In this paper, we focus on game rule design by using two evolutionary computations. The first EC is a multi-objective evolutionary algorithm in order to generate various skilled players. By using acquired skilled players, i.e., Pareto individuals in MOEA, another EC (evolutionary programming) adjusts game rule parameters i.e, an appropriate point of each card in the COMMONS GAME","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130310639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Delta3D is a GNU-licensed open source game engine with an orientation towards supporting "serious games" such as those with defense and homeland security applications. AI is an important issue for serious games, since there is more pressure to "get the AI right", as opposed to providing an entertaining user experience. We describe several of our near-and longer-term AI projects oriented towards making it easier to build AI-enhanced applications in Delta3D.
{"title":"Game AI in Delta3D","authors":"C. Darken, Bradley G. Anderegg, Perry McDowell","doi":"10.1109/CIG.2007.368114","DOIUrl":"https://doi.org/10.1109/CIG.2007.368114","url":null,"abstract":"Delta3D is a GNU-licensed open source game engine with an orientation towards supporting \"serious games\" such as those with defense and homeland security applications. AI is an important issue for serious games, since there is more pressure to \"get the AI right\", as opposed to providing an entertaining user experience. We describe several of our near-and longer-term AI projects oriented towards making it easier to build AI-enhanced applications in Delta3D.","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122394037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hnefatafl is an ancient Norse game - an ancestor of chess. In this paper, we report on the development of computer players for this game. In the spirit of Blondie24, we evolve neural networks as board evaluation functions for different versions of the game. An unusual aspect of this game is that there is no general agreement on the rules: it is no longer much played, and game historians attempt to infer the rules from scraps of historical texts, with ambiguities often resolved on gut feeling as to what the rules must have been in order to achieve a balanced game. We offer the evolutionary method as a means by which to judge the merits of alternative rule sets
{"title":"Evolving Players for an Ancient Game: Hnefatafl","authors":"P. Hingston","doi":"10.1109/CIG.2007.368094","DOIUrl":"https://doi.org/10.1109/CIG.2007.368094","url":null,"abstract":"Hnefatafl is an ancient Norse game - an ancestor of chess. In this paper, we report on the development of computer players for this game. In the spirit of Blondie24, we evolve neural networks as board evaluation functions for different versions of the game. An unusual aspect of this game is that there is no general agreement on the rules: it is no longer much played, and game historians attempt to infer the rules from scraps of historical texts, with ambiguities often resolved on gut feeling as to what the rules must have been in order to achieve a balanced game. We offer the evolutionary method as a means by which to judge the merits of alternative rule sets","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128611792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pac-Man (and variant) computer games have received some recent attention in artificial intelligence research. One reason is that the game provides a platform that is both simple enough to conduct experimental research and complex enough to require non-trivial strategies for successful game-play. This paper describes an approach to developing Pac-Man playing agents that learn game-play based on minimal onscreen information. The agents are based on evolving neural network controllers using a simple evolutionary algorithm. The results show that neuroevolution is able to produce agents that display novice playing ability, with a minimal amount of onscreen information, no knowledge of the rules of the game and a minimally informative fitness function. The limitations of the approach are also discussed, together with possible directions for extending the work towards producing better Pac-Man playing agents
{"title":"Evolving Pac-Man Players: Can We Learn from Raw Input?","authors":"M. Gallagher, M. Ledwich","doi":"10.1109/CIG.2007.368110","DOIUrl":"https://doi.org/10.1109/CIG.2007.368110","url":null,"abstract":"Pac-Man (and variant) computer games have received some recent attention in artificial intelligence research. One reason is that the game provides a platform that is both simple enough to conduct experimental research and complex enough to require non-trivial strategies for successful game-play. This paper describes an approach to developing Pac-Man playing agents that learn game-play based on minimal onscreen information. The agents are based on evolving neural network controllers using a simple evolutionary algorithm. The results show that neuroevolution is able to produce agents that display novice playing ability, with a minimal amount of onscreen information, no knowledge of the rules of the game and a minimally informative fitness function. The limitations of the approach are also discussed, together with possible directions for extending the work towards producing better Pac-Man playing agents","PeriodicalId":365269,"journal":{"name":"2007 IEEE Symposium on Computational Intelligence and Games","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114681532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}