Navigating the unknown: Leveraging self-information and diversity in partially observable environments

IF 2.5 3区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY Biochemical and biophysical research communications Pub Date : 2024-11-19 DOI:10.1016/j.bbrc.2024.150923
Devdhar Patel, Hava T. Siegelmann
{"title":"Navigating the unknown: Leveraging self-information and diversity in partially observable environments","authors":"Devdhar Patel,&nbsp;Hava T. Siegelmann","doi":"10.1016/j.bbrc.2024.150923","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.</div></div>","PeriodicalId":8779,"journal":{"name":"Biochemical and biophysical research communications","volume":"741 ","pages":"Article 150923"},"PeriodicalIF":2.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical and biophysical research communications","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0006291X24014591","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在未知环境中航行在部分可观测环境中利用自身信息和多样性。
强化学习算法在部分可观察环境中的学习往往十分困难,因为在这种环境中,环境的不同状态可能看起来完全相同。然而,并非所有的部分可观察环境都会给学习带来同样的困难。这项研究引入了 "不和谐距离 "的概念,这是一种可以估算在此类环境中学习难度的指标。我们证明,在部分可观察环境中,内部振荡或对先前行动的记忆等自我信息可以增加不和谐距离,使学习变得更容易。此外,感官闭塞可能会在学习完成后发生,导致缺乏足够的信息和灾难性的失败。为解决这一问题,我们提出了一种空间分层架构(SLA),其灵感来自大脑,可针对同一任务并行训练多个策略。SLA 可以改变每个时间步处理的外部信息量,提供一种自适应方法来处理环境状态空间中不断变化的信息。我们对 SLA 方法的有效性进行了评估,结果表明,在部分可观测的连续山地车环境中,该方法具有可学习性和鲁棒性,能够抵御现实中的噪声和闭塞感官输入。我们假设,像 SLA 这样的多策略方法可以解释大脑中复杂的多巴胺动态,而这种动态无法用最先进的标量时差误差来解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Biochemical and biophysical research communications
Biochemical and biophysical research communications 生物-生化与分子生物学
CiteScore
6.10
自引率
0.00%
发文量
1400
审稿时长
14 days
期刊介绍: Biochemical and Biophysical Research Communications is the premier international journal devoted to the very rapid dissemination of timely and significant experimental results in diverse fields of biological research. The development of the "Breakthroughs and Views" section brings the minireview format to the journal, and issues often contain collections of special interest manuscripts. BBRC is published weekly (52 issues/year).Research Areas now include: Biochemistry; biophysics; cell biology; developmental biology; immunology ; molecular biology; neurobiology; plant biology and proteomics
期刊最新文献
The comparative research of aspirin-ethanol induced acute gastric mucosal injury in sprague dawley rats and hypertensive rats Activation of the yeast MAP kinase, Slt2, protects against TDP-43 and TDP-25 toxicity in the Saccharomyces cerevisiae proteinopathy model Development of an indirect ELISA based on the VP1 protein for detection of antibodies against water buffalo Hunnivirus Liver-specific deletion of Agpat5 protects against liquid sucrose-induced hyperinsulinemia and glucose intolerance UV radiation enhanced encapsulation of superparamagnetic iron oxide nanoparticles (MNPs) in microparticles derived from tumor repopulating cells
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1