{"title":"具有不同质量评价函数的PUCT算法的实证分析","authors":"Kiminori Matsuzaki","doi":"10.1109/TAAI.2018.00043","DOIUrl":null,"url":null,"abstract":"Monte-Carlo tree search (MCTS) algorithms play an important role in developing computer players for many games. The performance of MCTS players is often leveraged in combination with offline knowledge, i.e., evaluation functions. In particular, recently AlphaGo and AlphaGo Zero achieved a big success in developing strong computer Go player by combining evaluation functions consisting of deep neural networks with a variant of PUCT (Predictor + UCB applied to trees). The effect of evaluation functions on the strength of MCTS algorithms, however, has not been investigated well, especially in terms of the quality of evaluation functions. In this study, we address this issue and empirically analyze the AlphaGo's PUCT algorithm by using Othello (Reversi) as the target game. We investigate the strength of PUCT players using variants of an existing evaluation function of a champion-level computer player. From intensive experiments, we found that the PUCT algorithm works very well especially with a good evaluation function and that the value function has more importance than the policy function in the PUCT algorithm.","PeriodicalId":211734,"journal":{"name":"2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Empirical Analysis of PUCT Algorithm with Evaluation Functions of Different Quality\",\"authors\":\"Kiminori Matsuzaki\",\"doi\":\"10.1109/TAAI.2018.00043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monte-Carlo tree search (MCTS) algorithms play an important role in developing computer players for many games. The performance of MCTS players is often leveraged in combination with offline knowledge, i.e., evaluation functions. In particular, recently AlphaGo and AlphaGo Zero achieved a big success in developing strong computer Go player by combining evaluation functions consisting of deep neural networks with a variant of PUCT (Predictor + UCB applied to trees). The effect of evaluation functions on the strength of MCTS algorithms, however, has not been investigated well, especially in terms of the quality of evaluation functions. In this study, we address this issue and empirically analyze the AlphaGo's PUCT algorithm by using Othello (Reversi) as the target game. We investigate the strength of PUCT players using variants of an existing evaluation function of a champion-level computer player. From intensive experiments, we found that the PUCT algorithm works very well especially with a good evaluation function and that the value function has more importance than the policy function in the PUCT algorithm.\",\"PeriodicalId\":211734,\"journal\":{\"name\":\"2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAAI.2018.00043\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2018.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Empirical Analysis of PUCT Algorithm with Evaluation Functions of Different Quality
Monte-Carlo tree search (MCTS) algorithms play an important role in developing computer players for many games. The performance of MCTS players is often leveraged in combination with offline knowledge, i.e., evaluation functions. In particular, recently AlphaGo and AlphaGo Zero achieved a big success in developing strong computer Go player by combining evaluation functions consisting of deep neural networks with a variant of PUCT (Predictor + UCB applied to trees). The effect of evaluation functions on the strength of MCTS algorithms, however, has not been investigated well, especially in terms of the quality of evaluation functions. In this study, we address this issue and empirically analyze the AlphaGo's PUCT algorithm by using Othello (Reversi) as the target game. We investigate the strength of PUCT players using variants of an existing evaluation function of a champion-level computer player. From intensive experiments, we found that the PUCT algorithm works very well especially with a good evaluation function and that the value function has more importance than the policy function in the PUCT algorithm.