利用完全可观察策略进行部分可观察学习

Conference on Robot Learning Pub Date : 2022-11-03 DOI:10.48550/arXiv.2211.01991

Hai V. Nguyen, Andrea Baisero, Dian Wang, Chris Amato, Robert W. Platt

{"title":"利用完全可观察策略进行部分可观察学习","authors":"Hai V. Nguyen, Andrea Baisero, Dian Wang, Chris Amato, Robert W. Platt","doi":"10.48550/arXiv.2211.01991","DOIUrl":null,"url":null,"abstract":"Reinforcement learning in partially observable domains is challenging due to the lack of observable state information. Thankfully, learning offline in a simulator with such state information is often possible. In particular, we propose a method for partially observable reinforcement learning that uses a fully observable policy (which we call a state expert) during offline training to improve online performance. Based on Soft Actor-Critic (SAC), our agent balances performing actions similar to the state expert and getting high returns under partial observability. Our approach can leverage the fully-observable policy for exploration and parts of the domain that are fully observable while still being able to learn under partial observability. On six robotics domains, our method outperforms pure imitation, pure reinforcement learning, the sequential or parallel combination of both types, and a recent state-of-the-art method in the same setting. A successful policy transfer to a physical robot in a manipulation task from pixels shows our approach's practicality in learning interesting policies under partial observability.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Leveraging Fully Observable Policies for Learning under Partial Observability\",\"authors\":\"Hai V. Nguyen, Andrea Baisero, Dian Wang, Chris Amato, Robert W. Platt\",\"doi\":\"10.48550/arXiv.2211.01991\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning in partially observable domains is challenging due to the lack of observable state information. Thankfully, learning offline in a simulator with such state information is often possible. In particular, we propose a method for partially observable reinforcement learning that uses a fully observable policy (which we call a state expert) during offline training to improve online performance. Based on Soft Actor-Critic (SAC), our agent balances performing actions similar to the state expert and getting high returns under partial observability. Our approach can leverage the fully-observable policy for exploration and parts of the domain that are fully observable while still being able to learn under partial observability. On six robotics domains, our method outperforms pure imitation, pure reinforcement learning, the sequential or parallel combination of both types, and a recent state-of-the-art method in the same setting. A successful policy transfer to a physical robot in a manipulation task from pixels shows our approach's practicality in learning interesting policies under partial observability.\",\"PeriodicalId\":273870,\"journal\":{\"name\":\"Conference on Robot Learning\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Robot Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2211.01991\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.01991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

部分可观察域的强化学习由于缺乏可观察状态信息而具有挑战性。值得庆幸的是，在具有这种状态信息的模拟器中离线学习通常是可能的。特别是，我们提出了一种部分可观察强化学习的方法，该方法在离线训练期间使用完全可观察策略(我们称之为状态专家)来提高在线性能。基于软行为者-批评家(SAC)，我们的智能体在部分可观察性下平衡了执行类似于状态专家的行为和获得高回报。我们的方法可以利用完全可观察策略来探索和部分完全可观察的领域，同时仍然能够在部分可观察性下学习。在六个机器人领域，我们的方法优于纯模仿，纯强化学习，两种类型的顺序或并行组合，以及在相同设置下的最新最先进的方法。在像素操作任务中成功地将策略转移到物理机器人上表明了我们的方法在部分可观察性下学习有趣策略的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Leveraging Fully Observable Policies for Learning under Partial Observability

Reinforcement learning in partially observable domains is challenging due to the lack of observable state information. Thankfully, learning offline in a simulator with such state information is often possible. In particular, we propose a method for partially observable reinforcement learning that uses a fully observable policy (which we call a state expert) during offline training to improve online performance. Based on Soft Actor-Critic (SAC), our agent balances performing actions similar to the state expert and getting high returns under partial observability. Our approach can leverage the fully-observable policy for exploration and parts of the domain that are fully observable while still being able to learn under partial observability. On six robotics domains, our method outperforms pure imitation, pure reinforcement learning, the sequential or parallel combination of both types, and a recent state-of-the-art method in the same setting. A successful policy transfer to a physical robot in a manipulation task from pixels shows our approach's practicality in learning interesting policies under partial observability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量

期刊最新文献

MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion Safe Robot Learning in Assistive Devices through Neural Network Repair COACH: Cooperative Robot Teaching Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping