Enhancing deep reinforcement learning for scale flexibility in real-time strategy games

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, CYBERNETICS Entertainment Computing Pub Date : 2024-07-31 DOI:10.1016/j.entcom.2024.100843

Marcelo Luiz Harry Diniz Lemos , Ronaldo e Silva Vieira , Anderson Rocha Tavares , Leandro Soriano Marcolino , Luiz Chaimowicz

{"title":"Enhancing deep reinforcement learning for scale flexibility in real-time strategy games","authors":"Marcelo Luiz Harry Diniz Lemos , Ronaldo e Silva Vieira , Anderson Rocha Tavares , Leandro Soriano Marcolino , Luiz Chaimowicz","doi":"10.1016/j.entcom.2024.100843","DOIUrl":null,"url":null,"abstract":"<div><p>Real-time strategy (RTS) games present a unique challenge for AI agents due to the combination of several fundamental AI problems. While Deep Reinforcement Learning (DRL) has shown promise in the development of autonomous agents for the genre, existing architectures often struggle with games featuring maps of varying dimensions. This limitation hinders the agent’s ability to generalize its learned strategies across different scenarios. This paper proposes a novel approach that overcomes this problem by incorporating Spatial Pyramid Pooling (SPP) within a DRL framework. We leverage the GridNet architecture’s encoder–decoder structure and integrate an SPP layer into the critic network of the Proximal Policy Optimization (PPO) algorithm. This SPP layer dynamically generates a standardized representation of the game state, regardless of the initial observation size. This allows the agent to effectively adapt its decision-making process to any map configuration. Our evaluations demonstrate that the proposed method significantly enhances the model’s flexibility and efficiency in training agents for various RTS game scenarios, albeit with some discernible limitations when applied to very small maps. This approach paves the way for more robust and adaptable AI agents capable of excelling in sequential decision problems with variable-size observations.</p></div>","PeriodicalId":55997,"journal":{"name":"Entertainment Computing","volume":"52 ","pages":"Article 100843"},"PeriodicalIF":2.8000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entertainment Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1875952124002118","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

Abstract

Real-time strategy (RTS) games present a unique challenge for AI agents due to the combination of several fundamental AI problems. While Deep Reinforcement Learning (DRL) has shown promise in the development of autonomous agents for the genre, existing architectures often struggle with games featuring maps of varying dimensions. This limitation hinders the agent’s ability to generalize its learned strategies across different scenarios. This paper proposes a novel approach that overcomes this problem by incorporating Spatial Pyramid Pooling (SPP) within a DRL framework. We leverage the GridNet architecture’s encoder–decoder structure and integrate an SPP layer into the critic network of the Proximal Policy Optimization (PPO) algorithm. This SPP layer dynamically generates a standardized representation of the game state, regardless of the initial observation size. This allows the agent to effectively adapt its decision-making process to any map configuration. Our evaluations demonstrate that the proposed method significantly enhances the model’s flexibility and efficiency in training agents for various RTS game scenarios, albeit with some discernible limitations when applied to very small maps. This approach paves the way for more robust and adaptable AI agents capable of excelling in sequential decision problems with variable-size observations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

增强深度强化学习，实现实时战略游戏的规模灵活性

实时战略（RTS）游戏由于结合了几个基本的人工智能问题，因此对人工智能代理提出了独特的挑战。虽然深度强化学习（DRL）在开发这类游戏的自主代理方面已显示出前景，但现有的架构往往难以应对具有不同维度地图的游戏。这种局限性阻碍了代理在不同场景中推广其所学策略的能力。本文提出了一种新方法，通过将空间金字塔池（SPP）纳入 DRL 框架来克服这一问题。我们利用 GridNet 架构的编码器-解码器结构，将 SPP 层集成到近端策略优化（PPO）算法的批判网络中。无论初始观测值大小如何，该 SPP 层都能动态生成游戏状态的标准化表示。这样，代理就能有效地使其决策过程适应任何地图配置。我们的评估结果表明，所提出的方法大大提高了模型的灵活性和效率，可以针对各种 RTS 游戏场景训练代理，尽管在应用于非常小的地图时有一些明显的局限性。这种方法为开发更强大、适应性更强的人工智能代理铺平了道路，使其能够在具有可变大小观察结果的连续决策问题中表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Entertainment Computing Computer Science-Human-Computer Interaction

CiteScore

5.90

自引率

7.10%

发文量

期刊介绍： Entertainment Computing publishes original, peer-reviewed research articles and serves as a forum for stimulating and disseminating innovative research ideas, emerging technologies, empirical investigations, state-of-the-art methods and tools in all aspects of digital entertainment, new media, entertainment computing, gaming, robotics, toys and applications among researchers, engineers, social scientists, artists and practitioners. Theoretical, technical, empirical, survey articles and case studies are all appropriate to the journal.