Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286476
Maarten P. D. Schadd, M. Winands, J. Uiterwijk
This article describes a new, game-independent forward-pruning technique for EXPECTIMAX, called CHANCEPROBCUT. It is the first technique to forward prune in chance nodes. Based on the strong correlation between evaluations obtained from searches at different depths, the technique prunes chance events if the result of the chance node is likely to fall outside the search window. In this article, CHANCEPROBCUT is tested in two games, i.e., Stratego and Dice. Experiments reveal that the technique is able to reduce the search tree significantly without a loss of move quality. Moreover, in both games there is also an increase of playing performance.
{"title":"CHANCEPROBCUT: Forward pruning in chance nodes","authors":"Maarten P. D. Schadd, M. Winands, J. Uiterwijk","doi":"10.1109/CIG.2009.5286476","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286476","url":null,"abstract":"This article describes a new, game-independent forward-pruning technique for EXPECTIMAX, called CHANCEPROBCUT. It is the first technique to forward prune in chance nodes. Based on the strong correlation between evaluations obtained from searches at different depths, the technique prunes chance events if the result of the chance node is likely to fall outside the search window. In this article, CHANCEPROBCUT is tested in two games, i.e., Stratego and Dice. Experiments reveal that the technique is able to reduce the search tree significantly without a loss of move quality. Moreover, in both games there is also an increase of playing performance.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114684496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286486
M. Szubert, Wojciech Jaśkowski, K. Krawiec
This paper presents Coevolutionary Temporal Difference Learning (CTDL), a novel way of hybridizing co-evolutionary search with reinforcement learning that works by interlacing one-population competitive coevolution with temporal difference learning. The coevolutionary part of the algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search. We apply CTDL to the board game of Othello, using weighted piece counter for representing players' strategies. The results of an extensive computational experiment demonstrate CTDL's superiority when compared to coevolution and reinforcement learning alone, particularly when coevolution maintains an archive to provide historical progress. The paper investigates the role of the relative intensity of coevolutionary search and temporal difference search, which turns out to be an essential parameter. The formulation of CTDL leads also to the introduction of Lamarckian form of coevolution, which we discuss in detail.
{"title":"Coevolutionary Temporal Difference Learning for Othello","authors":"M. Szubert, Wojciech Jaśkowski, K. Krawiec","doi":"10.1109/CIG.2009.5286486","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286486","url":null,"abstract":"This paper presents Coevolutionary Temporal Difference Learning (CTDL), a novel way of hybridizing co-evolutionary search with reinforcement learning that works by interlacing one-population competitive coevolution with temporal difference learning. The coevolutionary part of the algorithm provides for exploration of the solution space, while the temporal difference learning performs its exploitation by local search. We apply CTDL to the board game of Othello, using weighted piece counter for representing players' strategies. The results of an extensive computational experiment demonstrate CTDL's superiority when compared to coevolution and reinforcement learning alone, particularly when coevolution maintains an archive to provide historical progress. The paper investigates the role of the relative intensity of coevolutionary search and temporal difference search, which turns out to be an essential parameter. The formulation of CTDL leads also to the introduction of Lamarckian form of coevolution, which we discuss in detail.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114789965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286462
M. Parker, B. D. Bryant
Backpropagation and neuroevolution are used in a Lamarckian evolution process to train a neural network visual controller for agents in the Quake II environment. In previous work, we hand-coded a non-visual controller for supervising in backpropagation, but hand-coding can only be done for problems with known solutions. In this research the problem for the agent is to attack a moving enemy in a visually complex room with a large central pillar. Because we did not know a solution to the problem, we could not hand-code a supervising controller; instead, we evolve a non-visual neural network as supervisor to the visual controller. This setup creates controllers that learn much faster and have a greater fitness than those learning by neuroevolution-only on the same problem in the same amount of time.
{"title":"Backpropagation without human supervision for visual control in Quake II","authors":"M. Parker, B. D. Bryant","doi":"10.1109/CIG.2009.5286462","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286462","url":null,"abstract":"Backpropagation and neuroevolution are used in a Lamarckian evolution process to train a neural network visual controller for agents in the Quake II environment. In previous work, we hand-coded a non-visual controller for supervising in backpropagation, but hand-coding can only be done for problems with known solutions. In this research the problem for the agent is to attack a moving enemy in a visually complex room with a large central pillar. Because we did not know a solution to the problem, we could not hand-code a supervising controller; instead, we evolve a non-visual neural network as supervisor to the visual controller. This setup creates controllers that learn much faster and have a greater fitness than those learning by neuroevolution-only on the same problem in the same amount of time.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286499
G. Greenwood
The evolutionary minority game is extensively used to study adaptive behavior in a population of interacting agents. In time the agents self-organize despite the fact agents act independently in choosing how to play the game and do not know the play of any other agent. In this paper we study agents who collude with each other to play the same strategy. However, nothing prevents agents from being deceptive and playing a different strategy instead. It is shown that deceptive strategies can be profitable if the number of deceptive agents is small enough.
{"title":"Deceptive strategies for the evolutionary minority game","authors":"G. Greenwood","doi":"10.1109/CIG.2009.5286499","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286499","url":null,"abstract":"The evolutionary minority game is extensively used to study adaptive behavior in a population of interacting agents. In time the agents self-organize despite the fact agents act independently in choosing how to play the game and do not know the play of any other agent. In this paper we study agents who collude with each other to play the same strategy. However, nothing prevents agents from being deceptive and playing a different strategy instead. It is shown that deceptive strategies can be profitable if the number of deceptive agents is small enough.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116327310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286455
Attala Malik, J. Denzinger
We present an improvement to behavior testing of computer players based on evolutionary learning of cooperative behavior that extends the known approach to allow for so-called coordination macros. These macros represent knowledge about the application and are interpreted by the agents that are testing the computer player based on the current situation to achieve coordination between the agents. Our experimental evaluation using this approach to test computer players for one competition scenario of the ORTS real-time strategy game showed that the macros enabled the testing system to find weaknesses much faster than the previous approach, respectively to find weaknesses that the previous approach was not able to find within the given resource limit.
{"title":"Improving testing of multi-unit computer players for unwanted behavior using coordination macros","authors":"Attala Malik, J. Denzinger","doi":"10.1109/CIG.2009.5286455","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286455","url":null,"abstract":"We present an improvement to behavior testing of computer players based on evolutionary learning of cooperative behavior that extends the known approach to allow for so-called coordination macros. These macros represent knowledge about the application and are interpreted by the agents that are testing the computer player based on the current situation to achieve coordination between the agents. Our experimental evaluation using this approach to test computer players for one competition scenario of the ORTS real-time strategy game showed that the macros enabled the testing system to find weaknesses much faster than the previous approach, respectively to find weaknesses that the previous approach was not able to find within the given resource limit.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127217608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286496
S. Lucas
This paper introduces a novel function approximation architecture especially well suited to temporal difference learning. The architecture is based on using sets of interpolated table look-up functions. These offer rapid and stable learning, and are efficient when the number of inputs is small. An empirical investigation is conducted to test their performance on a supervised learning task, and on themountain car problem, a standard reinforcement learning benchmark. In each case, the interpolated table functions offer competitive performance.
{"title":"Temporal difference learning with interpolated table value functions","authors":"S. Lucas","doi":"10.1109/CIG.2009.5286496","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286496","url":null,"abstract":"This paper introduces a novel function approximation architecture especially well suited to temporal difference learning. The architecture is based on using sets of interpolated table look-up functions. These offer rapid and stable learning, and are efficient when the number of inputs is small. An empirical investigation is conducted to test their performance on a supervised learning task, and on themountain car problem, a standard reinforcement learning benchmark. In each case, the interpolated table functions offer competitive performance.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127225939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286480
L. Cardamone, D. Loiacono, P. Lanzi
In this paper, we apply imitation learning to develop drivers for The Open Racing Car Simulator (TORCS). Our approach can be classified as a direct method in that it applies supervised learning to learn car racing behaviors from the data collected from other drivers. In the literature, this approach is known to have led to extremely poor performance with drivers capable of completing only very small parts of a track. In this paper we show that, by using high-level information about the track ahead of the car and by predicting high-level actions, it is possible to develop drivers with performances that in some cases are only 15% lower than the performance of the fastest driver available in TORCS. Our experimental results suggest that our approach can be effective in developing drivers with good performance in non-trivial tracks using a very limited amount of data and computational resources. We analyze the driving behavior of the controllers developed using our approach and identify perceptual aliasing as one of the factors which can limit performance of our approach.
{"title":"Learning drivers for TORCS through imitation using supervised methods","authors":"L. Cardamone, D. Loiacono, P. Lanzi","doi":"10.1109/CIG.2009.5286480","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286480","url":null,"abstract":"In this paper, we apply imitation learning to develop drivers for The Open Racing Car Simulator (TORCS). Our approach can be classified as a direct method in that it applies supervised learning to learn car racing behaviors from the data collected from other drivers. In the literature, this approach is known to have led to extremely poor performance with drivers capable of completing only very small parts of a track. In this paper we show that, by using high-level information about the track ahead of the car and by predicting high-level actions, it is possible to develop drivers with performances that in some cases are only 15% lower than the performance of the fastest driver available in TORCS. Our experimental results suggest that our approach can be effective in developing drivers with good performance in non-trivial tracks using a very limited amount of data and computational resources. We analyze the driving behavior of the controllers developed using our approach and identify perceptual aliasing as one of the factors which can limit performance of our approach.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131549477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286493
D. Cuadrado, Y. Sáez
In this introductory work we present our first approach to a computer controller player for first-person shooter videogames, where we have applied genetic algorithms in order to evolve the best dodge rules. This paper is a report of the results obtained by our bot during the first competition held in Trondheim, Norway, during the IEEE Congress on Evolutionary Computation (CEC 2009).
{"title":"Chuck Norris rocks!","authors":"D. Cuadrado, Y. Sáez","doi":"10.1109/CIG.2009.5286493","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286493","url":null,"abstract":"In this introductory work we present our first approach to a computer controller player for first-person shooter videogames, where we have applied genetic algorithms in order to evolve the best dodge rules. This paper is a report of the results obtained by our bot during the first competition held in Trondheim, Norway, during the IEEE Congress on Evolutionary Computation (CEC 2009).","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128175966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286477
Nicola Basilico, N. Gatti, Thomas Rossi
Patrolling-intrusion games are recently receiving more and more attention in the literature. They are twoplayer non zero-sum games where an intruder tries to attack one place of interest and one patroller (or more) tries to capture the intruder. The patroller cannot completely cover the environment following a cycle, otherwise the intruder will successfully strike at least a target. Thus, the patroller employs a randomized strategy. These games are usually studied as leader-follower games, where the patroller is the leader and the intruder is the follower. The models proposed in the state of the art so far present several limitations that prevent their employment in realistic settings. In this paper, we refine the models from the state-of-the-art capturing patroller's augmented sensing capabilities and a possible delay in the intrusion, we propose algorithms to solve efficiently our extensions, and we experimentally evaluate the computational time in some case studies.
{"title":"Capturing augmented sensing capabilities and intrusion delay in patrolling-intrusion games","authors":"Nicola Basilico, N. Gatti, Thomas Rossi","doi":"10.1109/CIG.2009.5286477","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286477","url":null,"abstract":"Patrolling-intrusion games are recently receiving more and more attention in the literature. They are twoplayer non zero-sum games where an intruder tries to attack one place of interest and one patroller (or more) tries to capture the intruder. The patroller cannot completely cover the environment following a cycle, otherwise the intruder will successfully strike at least a target. Thus, the patroller employs a randomized strategy. These games are usually studied as leader-follower games, where the patroller is the leader and the intruder is the follower. The models proposed in the state of the art so far present several limitations that prevent their employment in realistic settings. In this paper, we refine the models from the state-of-the-art capturing patroller's augmented sensing capabilities and a possible delay in the intrusion, we propose algorithms to solve efficiently our extensions, and we experimentally evaluate the computational time in some case studies.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128695642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-09-07DOI: 10.1109/CIG.2009.5286490
Su-Hyung Jang, Jongwon Yoon, Sung-Bae Cho
In the real-time strategy game, success of AI depends on consecutive and effective decision making on actions by NPCs in the game. In this regard, there have been many researchers to find the optimized choice. This paper confirms the improvement of NPC performance in a real-time strategy game by using the speciated evolutionary algorithm for such decision making on actions, which has been largely applied to the classification problems. Creation and selection of members to use for this ensemble method is manifested through speciation and the performance is verified through ‘conqueror’, a real-time strategy game platform developed by our previous work.
{"title":"Optimal strategy selection of non-player character on real time strategy game using a speciated evolutionary algorithm","authors":"Su-Hyung Jang, Jongwon Yoon, Sung-Bae Cho","doi":"10.1109/CIG.2009.5286490","DOIUrl":"https://doi.org/10.1109/CIG.2009.5286490","url":null,"abstract":"In the real-time strategy game, success of AI depends on consecutive and effective decision making on actions by NPCs in the game. In this regard, there have been many researchers to find the optimized choice. This paper confirms the improvement of NPC performance in a real-time strategy game by using the speciated evolutionary algorithm for such decision making on actions, which has been largely applied to the classification problems. Creation and selection of members to use for this ensemble method is manifested through speciation and the performance is verified through ‘conqueror’, a real-time strategy game platform developed by our previous work.","PeriodicalId":358795,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence and Games","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114575109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}