Monte-Carlo simulation balancing revisited

Tobias Graf, M. Platzner
{"title":"Monte-Carlo simulation balancing revisited","authors":"Tobias Graf, M. Platzner","doi":"10.1109/CIG.2016.7860411","DOIUrl":null,"url":null,"abstract":"Simulation Balancing is an optimization algorithm to automatically tune the parameters of a playout policy used inside a Monte Carlo Tree Search. The algorithm fits a policy so that the expected result of a policy matches given target values of the training set. Up to now it has been successfully applied to Computer Go on small 9 × 9 boards but failed for larger board sizes like 19 × 19. On these large boards apprenticeship learning, which fits a policy so that it closely follows an expert, continues to be the algorithm of choice. In this paper we introduce several improvements to the original simulation balancing algorithm and test their effectiveness in Computer Go. The proposed additions remove the necessity to generate target values by deep searches, optimize faster and make the algorithm less prone to overfitting. The experiments show that simulation balancing improves the playing strength of a Go program using apprenticeship learning by more than 200 ELO on the large board size 19 × 19.","PeriodicalId":6594,"journal":{"name":"2016 IEEE Conference on Computational Intelligence and Games (CIG)","volume":"20 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computational Intelligence and Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2016.7860411","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Simulation Balancing is an optimization algorithm to automatically tune the parameters of a playout policy used inside a Monte Carlo Tree Search. The algorithm fits a policy so that the expected result of a policy matches given target values of the training set. Up to now it has been successfully applied to Computer Go on small 9 × 9 boards but failed for larger board sizes like 19 × 19. On these large boards apprenticeship learning, which fits a policy so that it closely follows an expert, continues to be the algorithm of choice. In this paper we introduce several improvements to the original simulation balancing algorithm and test their effectiveness in Computer Go. The proposed additions remove the necessity to generate target values by deep searches, optimize faster and make the algorithm less prone to overfitting. The experiments show that simulation balancing improves the playing strength of a Go program using apprenticeship learning by more than 200 ELO on the large board size 19 × 19.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
蒙特卡罗模拟平衡重新审视
仿真平衡是一种优化算法,用于自动调整蒙特卡罗树搜索中使用的播放策略参数。该算法拟合策略,使策略的预期结果与给定训练集的目标值相匹配。到目前为止,它已经成功地应用于计算机围棋小9 × 9板,但失败的较大的棋盘尺寸,如19 × 19。在这些大型董事会中,学徒制学习仍然是首选算法,它符合一项政策,因此它密切跟随专家。本文对原有的仿真平衡算法进行了改进,并对其在计算机围棋中的有效性进行了测试。所提出的加法消除了通过深度搜索生成目标值的必要性,优化速度更快,并且使算法不容易出现过拟合。实验表明,在19 × 19的大棋盘上,模拟平衡使采用学徒学习的围棋程序的下棋强度提高了200个ELO以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Human gesture classification by brute-force machine learning for exergaming in physiotherapy Evolving micro for 3D Real-Time Strategy games Constrained surprise search for content generation Design influence on player retention: A method based on time varying survival analysis Deep Q-learning using redundant outputs in visual doom
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1