{"title":"在《星际争霸:母巢之战》中使用基于计划的奖励塑造来学习策略","authors":"Kyriakos Efthymiadis, D. Kudenko","doi":"10.1109/CIG.2013.6633622","DOIUrl":null,"url":null,"abstract":"StarCraft: Broodwar (SC:BW) is a very popular commercial real strategy game (RTS) which has been extensively used in AI research. Despite being a popular test-bed reinforcement learning (RL) has not been evaluated extensively. A successful attempt was made to show the use of RL in a small-scale combat scenario involving an overpowered agent battling against multiple enemy units [1]. However, the chosen scenario was very small and not representative of the complexity of the game in its entirety. In order to build an RL agent that can manage the complexity of the full game, more efficient approaches must be used to tackle the state-space explosion. In this paper, we demonstrate how plan-based reward shaping can help an agent scale up to larger, more complex scenarios and significantly speed up the learning process as well as how high level planning can be combined with learning focusing on learning the Starcraft strategy, Battlecruiser Rush. We empirically show that the agent with plan-based reward shaping is significantly better both in terms of the learnt policy, as well as convergence speed when compared to baseline approaches which fail at reaching a good enough policy within a practical amount of time.","PeriodicalId":158902,"journal":{"name":"2013 IEEE Conference on Computational Inteligence in Games (CIG)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Using plan-based reward shaping to learn strategies in StarCraft: Broodwar\",\"authors\":\"Kyriakos Efthymiadis, D. Kudenko\",\"doi\":\"10.1109/CIG.2013.6633622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"StarCraft: Broodwar (SC:BW) is a very popular commercial real strategy game (RTS) which has been extensively used in AI research. Despite being a popular test-bed reinforcement learning (RL) has not been evaluated extensively. A successful attempt was made to show the use of RL in a small-scale combat scenario involving an overpowered agent battling against multiple enemy units [1]. However, the chosen scenario was very small and not representative of the complexity of the game in its entirety. In order to build an RL agent that can manage the complexity of the full game, more efficient approaches must be used to tackle the state-space explosion. In this paper, we demonstrate how plan-based reward shaping can help an agent scale up to larger, more complex scenarios and significantly speed up the learning process as well as how high level planning can be combined with learning focusing on learning the Starcraft strategy, Battlecruiser Rush. We empirically show that the agent with plan-based reward shaping is significantly better both in terms of the learnt policy, as well as convergence speed when compared to baseline approaches which fail at reaching a good enough policy within a practical amount of time.\",\"PeriodicalId\":158902,\"journal\":{\"name\":\"2013 IEEE Conference on Computational Inteligence in Games (CIG)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Conference on Computational Inteligence in Games (CIG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIG.2013.6633622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Computational Inteligence in Games (CIG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIG.2013.6633622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using plan-based reward shaping to learn strategies in StarCraft: Broodwar
StarCraft: Broodwar (SC:BW) is a very popular commercial real strategy game (RTS) which has been extensively used in AI research. Despite being a popular test-bed reinforcement learning (RL) has not been evaluated extensively. A successful attempt was made to show the use of RL in a small-scale combat scenario involving an overpowered agent battling against multiple enemy units [1]. However, the chosen scenario was very small and not representative of the complexity of the game in its entirety. In order to build an RL agent that can manage the complexity of the full game, more efficient approaches must be used to tackle the state-space explosion. In this paper, we demonstrate how plan-based reward shaping can help an agent scale up to larger, more complex scenarios and significantly speed up the learning process as well as how high level planning can be combined with learning focusing on learning the Starcraft strategy, Battlecruiser Rush. We empirically show that the agent with plan-based reward shaping is significantly better both in terms of the learnt policy, as well as convergence speed when compared to baseline approaches which fail at reaching a good enough policy within a practical amount of time.