Learning to Perceive in Deep Model-Free Reinforcement Learning

Gonccalo Querido, Alberto Sardinha, Francisco S. Melo
{"title":"Learning to Perceive in Deep Model-Free Reinforcement Learning","authors":"Gonccalo Querido, Alberto Sardinha, Francisco S. Melo","doi":"10.48550/arXiv.2301.03730","DOIUrl":null,"url":null,"abstract":"This work proposes a novel model-free Reinforcement Learning (RL) agent that is able to learn how to complete an unknown task having access to only a part of the input observation. We take inspiration from the concepts of visual attention and active perception that are characteristic of humans and tried to apply them to our agent, creating a hard attention mechanism. In this mechanism, the model decides first which region of the input image it should look at, and only after that it has access to the pixels of that region. Current RL agents do not follow this principle and we have not seen these mechanisms applied to the same purpose as this work. In our architecture, we adapt an existing model called recurrent attention model (RAM) and combine it with the proximal policy optimization (PPO) algorithm. We investigate whether a model with these characteristics is capable of achieving similar performance to state-of-the-art model-free RL agents that access the full input observation. This analysis is made in two Atari games, Pong and SpaceInvaders, which have a discrete action space, and in CarRacing, which has a continuous action space. Besides assessing its performance, we also analyze the movement of the attention of our model and compare it with what would be an example of the human behavior. Even with such visual limitation, we show that our model matches the performance of PPO+LSTM in two of the three games tested.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adaptive Agents and Multi-Agent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.03730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This work proposes a novel model-free Reinforcement Learning (RL) agent that is able to learn how to complete an unknown task having access to only a part of the input observation. We take inspiration from the concepts of visual attention and active perception that are characteristic of humans and tried to apply them to our agent, creating a hard attention mechanism. In this mechanism, the model decides first which region of the input image it should look at, and only after that it has access to the pixels of that region. Current RL agents do not follow this principle and we have not seen these mechanisms applied to the same purpose as this work. In our architecture, we adapt an existing model called recurrent attention model (RAM) and combine it with the proximal policy optimization (PPO) algorithm. We investigate whether a model with these characteristics is capable of achieving similar performance to state-of-the-art model-free RL agents that access the full input observation. This analysis is made in two Atari games, Pong and SpaceInvaders, which have a discrete action space, and in CarRacing, which has a continuous action space. Besides assessing its performance, we also analyze the movement of the attention of our model and compare it with what would be an example of the human behavior. Even with such visual limitation, we show that our model matches the performance of PPO+LSTM in two of the three games tested.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在深度无模型强化学习中学习感知
这项工作提出了一种新的无模型强化学习(RL)智能体,它能够学习如何仅访问部分输入观察来完成未知任务。我们从视觉注意和主动感知的概念中获得灵感,这是人类的特征,并试图将它们应用于我们的代理,创造一个硬注意机制。在这种机制中,模型首先决定应该查看输入图像的哪个区域,然后才能访问该区域的像素。目前的RL代理不遵循这一原则,我们还没有看到这些机制应用于与这项工作相同的目的。在我们的架构中,我们采用了一种称为循环注意模型(RAM)的现有模型,并将其与近端策略优化(PPO)算法相结合。我们研究具有这些特征的模型是否能够获得与访问完整输入观察的最先进的无模型RL代理相似的性能。这一分析是针对雅达利的两款游戏《Pong》和《SpaceInvaders》进行的,这两款游戏拥有离散的动作空间,而《CarRacing》则拥有连续的动作空间。除了评估其性能外,我们还分析了我们模型的注意力运动,并将其与人类行为的例子进行比较。即使有这样的视觉限制,我们也表明我们的模型在测试的三款游戏中的两款中与PPO+LSTM的性能相匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Discovering Consistent Subelections Strategic Cost Selection in Participatory Budgeting Minimizing State Exploration While Searching Graphs with Unknown Obstacles vMFER: von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement of Actor-Critic Algorithms Reinforcement Nash Equilibrium Solver
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1