Descent在计算机奥林匹克竞赛中获得五枚金牌

Quentin Cohen-Solal, T. Cazenave
{"title":"Descent在计算机奥林匹克竞赛中获得五枚金牌","authors":"Quentin Cohen-Solal, T. Cazenave","doi":"10.3233/icg-210192","DOIUrl":null,"url":null,"abstract":"Unlike AlphaZero-like algorithms (Silver et al., 2018), the Descent framework uses a variant of Unbounded Minimax (Korf and Chickering, 1996), instead of Monte Carlo Tree Search, to construct the partial game tree used to determine the best action to play and to collect data for learning. During training, at each move, the best sequences of moves are iteratively extended until terminal states. During evaluations, the safest action is chosen (after that the best sequences of moves are iteratively extended each until a leaf state is reached). Moreover, it also does not use a policy network, only a value network. The actions therefore do not need to be encoded. Unlike the AlphaZero paradigm, with Descent all data generated during the searches to determine the best actions to play is used for learning. As a result, much more data is generated per game, and thus the training is done more quickly and does not require a (massive) parallelization to give good results (contrary to AlphaZero). It can use end-of-game heuristic evaluation to improve its level of play faster, such as game score or game length (in order to win quickly and lose slowly).","PeriodicalId":14829,"journal":{"name":"J. Int. Comput. Games Assoc.","volume":"60 3","pages":"132-134"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Descent wins five gold medals at the Computer Olympiad\",\"authors\":\"Quentin Cohen-Solal, T. Cazenave\",\"doi\":\"10.3233/icg-210192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unlike AlphaZero-like algorithms (Silver et al., 2018), the Descent framework uses a variant of Unbounded Minimax (Korf and Chickering, 1996), instead of Monte Carlo Tree Search, to construct the partial game tree used to determine the best action to play and to collect data for learning. During training, at each move, the best sequences of moves are iteratively extended until terminal states. During evaluations, the safest action is chosen (after that the best sequences of moves are iteratively extended each until a leaf state is reached). Moreover, it also does not use a policy network, only a value network. The actions therefore do not need to be encoded. Unlike the AlphaZero paradigm, with Descent all data generated during the searches to determine the best actions to play is used for learning. As a result, much more data is generated per game, and thus the training is done more quickly and does not require a (massive) parallelization to give good results (contrary to AlphaZero). It can use end-of-game heuristic evaluation to improve its level of play faster, such as game score or game length (in order to win quickly and lose slowly).\",\"PeriodicalId\":14829,\"journal\":{\"name\":\"J. Int. Comput. Games Assoc.\",\"volume\":\"60 3\",\"pages\":\"132-134\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Int. Comput. Games Assoc.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/icg-210192\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Int. Comput. Games Assoc.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/icg-210192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

与类似alphazero的算法(Silver et al., 2018)不同,Descent框架使用Unbounded Minimax (Korf and Chickering, 1996)的变体,而不是Monte Carlo Tree Search,来构建用于确定最佳操作和收集数据以供学习的部分博弈树。在训练过程中,每次移动时,迭代扩展最佳移动序列,直到最终状态。在评估过程中,选择最安全的动作(之后,每次迭代扩展最佳的移动序列,直到达到叶子状态)。此外,它也不使用政策网络,只使用价值网络。因此,不需要对操作进行编码。与AlphaZero范例不同的是,Descent在搜索过程中生成的所有数据都用于学习,以确定最佳操作。因此,每场比赛产生更多的数据,因此训练可以更快地完成,并且不需要(大规模)并行化来获得良好的结果(与AlphaZero相反)。它可以使用游戏结束启发式评估来更快地提高游戏水平,例如游戏分数或游戏长度(为了快速获胜和缓慢失败)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Descent wins five gold medals at the Computer Olympiad
Unlike AlphaZero-like algorithms (Silver et al., 2018), the Descent framework uses a variant of Unbounded Minimax (Korf and Chickering, 1996), instead of Monte Carlo Tree Search, to construct the partial game tree used to determine the best action to play and to collect data for learning. During training, at each move, the best sequences of moves are iteratively extended until terminal states. During evaluations, the safest action is chosen (after that the best sequences of moves are iteratively extended each until a leaf state is reached). Moreover, it also does not use a policy network, only a value network. The actions therefore do not need to be encoded. Unlike the AlphaZero paradigm, with Descent all data generated during the searches to determine the best actions to play is used for learning. As a result, much more data is generated per game, and thus the training is done more quickly and does not require a (massive) parallelization to give good results (contrary to AlphaZero). It can use end-of-game heuristic evaluation to improve its level of play faster, such as game score or game length (in order to win quickly and lose slowly).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial: Rankings and ratings Automated videogame mechanics generation with XVGDL Editorial: Jan Krabbenbos, Videogames, CG 2022 and the 2022 WCCC The 2022 World Computer Chess Championships The SSDF Chess Engine Rating List, 2020-12
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1