Learning to Perceive in Deep Model-Free Reinforcement Learning

Adaptive Agents and Multi-Agent Systems Pub Date : 2023-01-10 DOI:10.48550/arXiv.2301.03730

Gonccalo Querido, Alberto Sardinha, Francisco S. Melo

{"title":"Learning to Perceive in Deep Model-Free Reinforcement Learning","authors":"Gonccalo Querido, Alberto Sardinha, Francisco S. Melo","doi":"10.48550/arXiv.2301.03730","DOIUrl":null,"url":null,"abstract":"This work proposes a novel model-free Reinforcement Learning (RL) agent that is able to learn how to complete an unknown task having access to only a part of the input observation. We take inspiration from the concepts of visual attention and active perception that are characteristic of humans and tried to apply them to our agent, creating a hard attention mechanism. In this mechanism, the model decides first which region of the input image it should look at, and only after that it has access to the pixels of that region. Current RL agents do not follow this principle and we have not seen these mechanisms applied to the same purpose as this work. In our architecture, we adapt an existing model called recurrent attention model (RAM) and combine it with the proximal policy optimization (PPO) algorithm. We investigate whether a model with these characteristics is capable of achieving similar performance to state-of-the-art model-free RL agents that access the full input observation. This analysis is made in two Atari games, Pong and SpaceInvaders, which have a discrete action space, and in CarRacing, which has a continuous action space. Besides assessing its performance, we also analyze the movement of the attention of our model and compare it with what would be an example of the human behavior. Even with such visual limitation, we show that our model matches the performance of PPO+LSTM in two of the three games tested.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adaptive Agents and Multi-Agent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.03730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This work proposes a novel model-free Reinforcement Learning (RL) agent that is able to learn how to complete an unknown task having access to only a part of the input observation. We take inspiration from the concepts of visual attention and active perception that are characteristic of humans and tried to apply them to our agent, creating a hard attention mechanism. In this mechanism, the model decides first which region of the input image it should look at, and only after that it has access to the pixels of that region. Current RL agents do not follow this principle and we have not seen these mechanisms applied to the same purpose as this work. In our architecture, we adapt an existing model called recurrent attention model (RAM) and combine it with the proximal policy optimization (PPO) algorithm. We investigate whether a model with these characteristics is capable of achieving similar performance to state-of-the-art model-free RL agents that access the full input observation. This analysis is made in two Atari games, Pong and SpaceInvaders, which have a discrete action space, and in CarRacing, which has a continuous action space. Besides assessing its performance, we also analyze the movement of the attention of our model and compare it with what would be an example of the human behavior. Even with such visual limitation, we show that our model matches the performance of PPO+LSTM in two of the three games tested.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在深度无模型强化学习中学习感知

这项工作提出了一种新的无模型强化学习(RL)智能体，它能够学习如何仅访问部分输入观察来完成未知任务。我们从视觉注意和主动感知的概念中获得灵感，这是人类的特征，并试图将它们应用于我们的代理，创造一个硬注意机制。在这种机制中，模型首先决定应该查看输入图像的哪个区域，然后才能访问该区域的像素。目前的RL代理不遵循这一原则，我们还没有看到这些机制应用于与这项工作相同的目的。在我们的架构中，我们采用了一种称为循环注意模型(RAM)的现有模型，并将其与近端策略优化(PPO)算法相结合。我们研究具有这些特征的模型是否能够获得与访问完整输入观察的最先进的无模型RL代理相似的性能。这一分析是针对雅达利的两款游戏《Pong》和《SpaceInvaders》进行的，这两款游戏拥有离散的动作空间，而《CarRacing》则拥有连续的动作空间。除了评估其性能外，我们还分析了我们模型的注意力运动，并将其与人类行为的例子进行比较。即使有这样的视觉限制，我们也表明我们的模型在测试的三款游戏中的两款中与PPO+LSTM的性能相匹配。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Adaptive Agents and Multi-Agent Systems

自引率

0.00%

发文量

期刊最新文献

Discovering Consistent Subelections Strategic Cost Selection in Participatory Budgeting Minimizing State Exploration While Searching Graphs with Unknown Obstacles vMFER: von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement of Actor-Critic Algorithms Reinforcement Nash Equilibrium Solver