Multistage Temporal Difference Learning for 2048-Like Games

Q2 Computer Science IEEE Transactions on Computational Intelligence and AI in Games Pub Date : 2016-06-23 DOI:10.1109/TCIAIG.2016.2593710

Kun-Hao Yeh, I-Chen Wu, Chu-Hsuan Hsueh, Chia-Chuan Chang, Chao-Chin Liang, Chiang Han

{"title":"Multistage Temporal Difference Learning for 2048-Like Games","authors":"Kun-Hao Yeh, I-Chen Wu, Chu-Hsuan Hsueh, Chia-Chuan Chang, Chao-Chin Liang, Chiang Han","doi":"10.1109/TCIAIG.2016.2593710","DOIUrl":null,"url":null,"abstract":"Szubert and Jaśkowski successfully used temporal difference (TD) learning together with n -tuple networks for playing the game 2048. However, we observed a phenomenon that the programs based on TD learning still hardly reach large tiles. In this paper, we propose multistage TD (MS-TD) learning, a kind of hierarchical reinforcement learning method, to effectively improve the performance for the rates of reaching large tiles, which are good metrics to analyze the strength of 2048 programs. Our experiments showed significant improvements over the one without using MS-TD learning. Namely, using 3-ply expectimax search, the program with MS-TD learning reached 32768-tiles with a rate of 18.31%, while the one with TD learning did not reach any. After further tuned, our 2048 program reached 32768-tiles with a rate of 31.75% in 10,000 games, and one among these games even reached a 65536-tiles, which is the first ever reaching a 65536-tiles to our knowledge. In addition, MS-TD learning method can be easily applied to other 2048-like games, such as Threes. Based on MS-TD learning, our experiments for Threes also demonstrated similar performance improvement, where the program with MS-TD learning reached 6144-tiles with a rate of 7.83%, while the one with TD learning only reached 0.45%.","PeriodicalId":49192,"journal":{"name":"IEEE Transactions on Computational Intelligence and AI in Games","volume":"9 1","pages":"369-380"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TCIAIG.2016.2593710","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Intelligence and AI in Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TCIAIG.2016.2593710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 21

Abstract

Szubert and Jaśkowski successfully used temporal difference (TD) learning together with n -tuple networks for playing the game 2048. However, we observed a phenomenon that the programs based on TD learning still hardly reach large tiles. In this paper, we propose multistage TD (MS-TD) learning, a kind of hierarchical reinforcement learning method, to effectively improve the performance for the rates of reaching large tiles, which are good metrics to analyze the strength of 2048 programs. Our experiments showed significant improvements over the one without using MS-TD learning. Namely, using 3-ply expectimax search, the program with MS-TD learning reached 32768-tiles with a rate of 18.31%, while the one with TD learning did not reach any. After further tuned, our 2048 program reached 32768-tiles with a rate of 31.75% in 10,000 games, and one among these games even reached a 65536-tiles, which is the first ever reaching a 65536-tiles to our knowledge. In addition, MS-TD learning method can be easily applied to other 2048-like games, such as Threes. Based on MS-TD learning, our experiments for Threes also demonstrated similar performance improvement, where the program with MS-TD learning reached 6144-tiles with a rate of 7.83%, while the one with TD learning only reached 0.45%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向2048类游戏的多阶段时间差异学习

Szubert和Jaśkowski成功地将时间差分(TD)学习与n元组网络一起用于玩游戏2048。然而，我们观察到一个现象，基于TD学习的程序仍然很难达到大的瓷砖。在本文中，我们提出了多级TD (MS-TD)学习，这是一种分层强化学习方法，可以有效地提高达到大块的率的性能，这是分析2048个程序强度的良好指标。我们的实验表明，与不使用MS-TD学习的实验相比，我们的实验有了显著的改进。即使用3-ply expectimax搜索，MS-TD学习的程序达到32768个tile，率为18.31%，而TD学习的程序没有达到任何tile。经过进一步的调整，我们的2048程序在1万局游戏中达到了32768块，命中率为31.75%，其中有一款游戏甚至达到了65536块，这是我们所知的第一次达到65536块。此外，MS-TD的学习方法也可以很容易地应用到其他类似2048的游戏中，比如《Threes》。在MS-TD学习的基础上，我们对Threes的实验也显示了类似的性能提升，其中MS-TD学习的程序达到了6144块，速率为7.83%，而TD学习的程序只有0.45%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Computational Intelligence and AI in Games COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

4.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Cessation. The IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) publishes archival journal quality original papers in computational intelligence and related areas in artificial intelligence applied to games, including but not limited to videogames, mathematical games, human–computer interactions in games, and games involving physical objects. Emphasis is placed on the use of these methods to improve performance in and understanding of the dynamics of games, as well as gaining insight into the properties of the methods as applied to games. It also includes using games as a platform for building intelligent embedded agents for the real world. Papers connecting games to all areas of computational intelligence and traditional AI are considered.