Online Adaptable Learning Rates for the Game Connect-4

Q2 Computer Science IEEE Transactions on Computational Intelligence and AI in Games Pub Date : 2016-03-01 DOI:10.1109/TCIAIG.2014.2367105

Samineh Bagheri, Markus Thill, P. Koch, W. Konen

{"title":"Online Adaptable Learning Rates for the Game Connect-4","authors":"Samineh Bagheri, Markus Thill, P. Koch, W. Konen","doi":"10.1109/TCIAIG.2014.2367105","DOIUrl":null,"url":null,"abstract":"Learning board games by self-play has a long tradition in computational intelligence for games. Based on Tesauro's seminal success with TD-Gammon in 1994, many successful agents use temporal difference learning today. But in order to be successful with temporal difference learning on game tasks, often a careful selection of features and a large number of training games is necessary. Even for board games of moderate complexity like Connect-4, we found in previous work that a very rich initial feature set and several millions of game plays are required. In this work we investigate different approaches of online-adaptable learning rates like Incremental Delta Bar Delta (IDBD) or temporal coherence learning (TCL) whether they have the potential to speed up learning for such a complex task. We propose a new variant of TCL with geometric step size changes. We compare those algorithms with several other state-of-the-art learning rate adaptation algorithms and perform a case study on the sensitivity with respect to their meta parameters. We show that in this set of learning algorithms those with geometric step size changes outperform those other algorithms with constant step size changes. Algorithms with nonlinear output functions are slightly better than linear ones. Algorithms with geometric step size changes learn faster by a factor of 4 as compared to previously published results on the task Connect-4.","PeriodicalId":49192,"journal":{"name":"IEEE Transactions on Computational Intelligence and AI in Games","volume":"8 1","pages":"33-42"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TCIAIG.2014.2367105","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Intelligence and AI in Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TCIAIG.2014.2367105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 25

Abstract

Learning board games by self-play has a long tradition in computational intelligence for games. Based on Tesauro's seminal success with TD-Gammon in 1994, many successful agents use temporal difference learning today. But in order to be successful with temporal difference learning on game tasks, often a careful selection of features and a large number of training games is necessary. Even for board games of moderate complexity like Connect-4, we found in previous work that a very rich initial feature set and several millions of game plays are required. In this work we investigate different approaches of online-adaptable learning rates like Incremental Delta Bar Delta (IDBD) or temporal coherence learning (TCL) whether they have the potential to speed up learning for such a complex task. We propose a new variant of TCL with geometric step size changes. We compare those algorithms with several other state-of-the-art learning rate adaptation algorithms and perform a case study on the sensitivity with respect to their meta parameters. We show that in this set of learning algorithms those with geometric step size changes outperform those other algorithms with constant step size changes. Algorithms with nonlinear output functions are slightly better than linear ones. Algorithms with geometric step size changes learn faster by a factor of 4 as compared to previously published results on the task Connect-4.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

游戏Connect-4的在线适应性学习率

通过自玩学习棋类游戏在游戏计算智能领域有着悠久的传统。基于Tesauro在1994年对TD-Gammon的开创性成功，今天许多成功的代理都使用了时间差异学习。但是为了在游戏任务中成功地进行时间差异学习，通常需要仔细选择特征和大量的训练游戏。即使是像《Connect-4》这样中等复杂度的桌面游戏，我们也需要非常丰富的初始功能集和数百万的游戏玩法。在这项工作中，我们研究了不同的在线适应学习率方法，如增量增量条形增量(IDBD)或时间相干学习(TCL)，它们是否有可能加速这种复杂任务的学习。我们提出了一个具有几何步长变化的TCL的新变体。我们将这些算法与其他几种最先进的学习率自适应算法进行比较，并对其元参数的敏感性进行案例研究。我们表明，在这组学习算法中，那些具有几何步长变化的算法优于那些具有恒定步长变化的算法。具有非线性输出函数的算法略优于线性输出函数的算法。与先前发表的关于Connect-4任务的结果相比，具有几何步长变化的算法的学习速度提高了4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Computational Intelligence and AI in Games COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

4.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Cessation. The IEEE Transactions on Computational Intelligence and AI in Games (T-CIAIG) publishes archival journal quality original papers in computational intelligence and related areas in artificial intelligence applied to games, including but not limited to videogames, mathematical games, human–computer interactions in games, and games involving physical objects. Emphasis is placed on the use of these methods to improve performance in and understanding of the dynamics of games, as well as gaining insight into the properties of the methods as applied to games. It also includes using games as a platform for building intelligent embedded agents for the real world. Papers connecting games to all areas of computational intelligence and traditional AI are considered.