在《星际争霸:母巢之战》中使用基于计划的奖励塑造来学习策略

Kyriakos Efthymiadis, D. Kudenko
{"title":"在《星际争霸:母巢之战》中使用基于计划的奖励塑造来学习策略","authors":"Kyriakos Efthymiadis, D. Kudenko","doi":"10.1109/CIG.2013.6633622","DOIUrl":null,"url":null,"abstract":"StarCraft: Broodwar (SC:BW) is a very popular commercial real strategy game (RTS) which has been extensively used in AI research. Despite being a popular test-bed reinforcement learning (RL) has not been evaluated extensively. A successful attempt was made to show the use of RL in a small-scale combat scenario involving an overpowered agent battling against multiple enemy units [1]. However, the chosen scenario was very small and not representative of the complexity of the game in its entirety. In order to build an RL agent that can manage the complexity of the full game, more efficient approaches must be used to tackle the state-space explosion. In this paper, we demonstrate how plan-based reward shaping can help an agent scale up to larger, more complex scenarios and significantly speed up the learning process as well as how high level planning can be combined with learning focusing on learning the Starcraft strategy, Battlecruiser Rush. We empirically show that the agent with plan-based reward shaping is significantly better both in terms of the learnt policy, as well as convergence speed when compared to baseline approaches which fail at reaching a good enough policy within a practical amount of time.","PeriodicalId":158902,"journal":{"name":"2013 IEEE Conference on Computational Inteligence in Games (CIG)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Using plan-based reward shaping to learn strategies in StarCraft: Broodwar\",\"authors\":\"Kyriakos Efthymiadis, D. Kudenko\",\"doi\":\"10.1109/CIG.2013.6633622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"StarCraft: Broodwar (SC:BW) is a very popular commercial real strategy game (RTS) which has been extensively used in AI research. Despite being a popular test-bed reinforcement learning (RL) has not been evaluated extensively. A successful attempt was made to show the use of RL in a small-scale combat scenario involving an overpowered agent battling against multiple enemy units [1]. However, the chosen scenario was very small and not representative of the complexity of the game in its entirety. In order to build an RL agent that can manage the complexity of the full game, more efficient approaches must be used to tackle the state-space explosion. In this paper, we demonstrate how plan-based reward shaping can help an agent scale up to larger, more complex scenarios and significantly speed up the learning process as well as how high level planning can be combined with learning focusing on learning the Starcraft strategy, Battlecruiser Rush. We empirically show that the agent with plan-based reward shaping is significantly better both in terms of the learnt policy, as well as convergence speed when compared to baseline approaches which fail at reaching a good enough policy within a practical amount of time.\",\"PeriodicalId\":158902,\"journal\":{\"name\":\"2013 IEEE Conference on Computational Inteligence in Games (CIG)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Conference on Computational Inteligence in Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2013.6633622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Computational Inteligence in Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2013.6633622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

星际争霸:母巢之战(SC:BW)是一款非常受欢迎的商业真实战略游戏(RTS),在人工智能研究中得到了广泛的应用。尽管强化学习(RL)是一种流行的测试平台,但尚未得到广泛的评估。一个成功的尝试是在小规模的战斗场景中展示RL的使用,其中涉及一个强大的代理与多个敌方单位作战[1]。然而,所选择的场景非常小,不能代表整个游戏的复杂性。为了构建一个能够管理整个游戏复杂性的强化学习代理,必须使用更有效的方法来处理状态空间爆炸。在本文中,我们展示了基于计划的奖励塑造如何帮助智能体扩展到更大、更复杂的场景,并显著加快学习过程,以及如何将高级计划与专注于学习《星际争霸》策略的学习结合起来。我们的经验表明,与在实际时间内无法达到足够好的策略的基线方法相比,具有基于计划的奖励塑造的智能体在学习策略和收敛速度方面都明显更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Using plan-based reward shaping to learn strategies in StarCraft: Broodwar
StarCraft: Broodwar (SC:BW) is a very popular commercial real strategy game (RTS) which has been extensively used in AI research. Despite being a popular test-bed reinforcement learning (RL) has not been evaluated extensively. A successful attempt was made to show the use of RL in a small-scale combat scenario involving an overpowered agent battling against multiple enemy units [1]. However, the chosen scenario was very small and not representative of the complexity of the game in its entirety. In order to build an RL agent that can manage the complexity of the full game, more efficient approaches must be used to tackle the state-space explosion. In this paper, we demonstrate how plan-based reward shaping can help an agent scale up to larger, more complex scenarios and significantly speed up the learning process as well as how high level planning can be combined with learning focusing on learning the Starcraft strategy, Battlecruiser Rush. We empirically show that the agent with plan-based reward shaping is significantly better both in terms of the learnt policy, as well as convergence speed when compared to baseline approaches which fail at reaching a good enough policy within a practical amount of time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
QL-BT: Enhancing behaviour tree design and implementation with Q-learning Landscape automata for search based procedural content generation The structure of a 3-state finite transducer representation for Prisoner's Dilemma LGOAP: Adaptive layered planning for real-time videogames Evolved weapons for RPG drop systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1