{"title":"在未知环境中航行在部分可观测环境中利用自身信息和多样性。","authors":"Devdhar Patel, Hava T. Siegelmann","doi":"10.1016/j.bbrc.2024.150923","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.</div></div>","PeriodicalId":8779,"journal":{"name":"Biochemical and biophysical research communications","volume":"741 ","pages":"Article 150923"},"PeriodicalIF":2.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Navigating the unknown: Leveraging self-information and diversity in partially observable environments\",\"authors\":\"Devdhar Patel, Hava T. Siegelmann\",\"doi\":\"10.1016/j.bbrc.2024.150923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.</div></div>\",\"PeriodicalId\":8779,\"journal\":{\"name\":\"Biochemical and biophysical research communications\",\"volume\":\"741 \",\"pages\":\"Article 150923\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biochemical and biophysical research communications\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0006291X24014591\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biochemical and biophysical research communications","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0006291X24014591","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Navigating the unknown: Leveraging self-information and diversity in partially observable environments
Reinforcement learning algorithms often struggle to learn in partially observable environments, where different states of the environment may appear identical. However, not all partially observable environments pose the same level of difficulty for learning. This work introduces the concept of dissonance distance, a metric that can estimate the difficulty of learning in such environments. We demonstrate that self-information, such as internal oscillations or memory of previous actions, can increase the dissonance distance and make learning easier in partially observable environments. Additionally, sensory occlusion may occur after learning was completed, leading to a lack of sufficient information and catastrophic failure. To address this, we propose a spatially layered architecture (SLA) inspired by the brain, which trains multiple policies in parallel for the same task. SLA can change the amount of external information processed at each timestep, providing an adaptive approach to handle the changing information in the environment state-space. We evaluate the effectiveness of our SLA method showing learnability and robustness against realistic noise and occlusion in sensory inputs for the partially observable Continuous Mountain Car environment. We hypothesize that multi-policy approaches like SLA might explain the complex dopamine dynamics in the brain that cannot be explained with the state of the art scalar Temporal Difference error.
期刊介绍:
Biochemical and Biophysical Research Communications is the premier international journal devoted to the very rapid dissemination of timely and significant experimental results in diverse fields of biological research. The development of the "Breakthroughs and Views" section brings the minireview format to the journal, and issues often contain collections of special interest manuscripts. BBRC is published weekly (52 issues/year).Research Areas now include: Biochemistry; biophysics; cell biology; developmental biology; immunology
; molecular biology; neurobiology; plant biology and proteomics