{"title":"用时间差分学习构造静态评价函数","authors":"Samuel Choi Ping Man","doi":"10.18495/COMENGAPP.V2I1.18","DOIUrl":null,"url":null,"abstract":"Programming computers to play board games against human players has long been used as a measure for the development of artificial intelligence. The standard approach for computer game playing is to search for the best move from a given game state by using minimax search with static evaluation function. The static evaluation function is critical to the game playing performance but its design often relies on human expert players. This paper discusses how temporal differences (TD) learning can be used to construct a static evaluation function through self-playing and evaluates the effects for various parameter settings. The game of Kalah, a non-chance game of moderate complexity, is chosen as a testbed. The empirical result shows that TD learning is particularly promising for constructing a good evaluation function for the end games and can substantially improve the overall game playing performance in learning the entire game. DOI: 10.18495/comengapp.21.175184","PeriodicalId":120500,"journal":{"name":"Computer Engineering and Applications","volume":"200 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Constructing Static Evaluation Function using Temporal Difference Learning\",\"authors\":\"Samuel Choi Ping Man\",\"doi\":\"10.18495/COMENGAPP.V2I1.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Programming computers to play board games against human players has long been used as a measure for the development of artificial intelligence. The standard approach for computer game playing is to search for the best move from a given game state by using minimax search with static evaluation function. The static evaluation function is critical to the game playing performance but its design often relies on human expert players. This paper discusses how temporal differences (TD) learning can be used to construct a static evaluation function through self-playing and evaluates the effects for various parameter settings. The game of Kalah, a non-chance game of moderate complexity, is chosen as a testbed. The empirical result shows that TD learning is particularly promising for constructing a good evaluation function for the end games and can substantially improve the overall game playing performance in learning the entire game. DOI: 10.18495/comengapp.21.175184\",\"PeriodicalId\":120500,\"journal\":{\"name\":\"Computer Engineering and Applications\",\"volume\":\"200 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Engineering and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18495/COMENGAPP.V2I1.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18495/COMENGAPP.V2I1.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On Constructing Static Evaluation Function using Temporal Difference Learning
Programming computers to play board games against human players has long been used as a measure for the development of artificial intelligence. The standard approach for computer game playing is to search for the best move from a given game state by using minimax search with static evaluation function. The static evaluation function is critical to the game playing performance but its design often relies on human expert players. This paper discusses how temporal differences (TD) learning can be used to construct a static evaluation function through self-playing and evaluates the effects for various parameter settings. The game of Kalah, a non-chance game of moderate complexity, is chosen as a testbed. The empirical result shows that TD learning is particularly promising for constructing a good evaluation function for the end games and can substantially improve the overall game playing performance in learning the entire game. DOI: 10.18495/comengapp.21.175184