Context-Generative Default Policy for Bounded Rational Agent

Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu
{"title":"Context-Generative Default Policy for Bounded Rational Agent","authors":"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu","doi":"arxiv-2409.11604","DOIUrl":null,"url":null,"abstract":"Bounded rational agents often make decisions by evaluating a finite selection\nof choices, typically derived from a reference point termed the $`$default\npolicy,' based on previous experience. However, the inherent rigidity of the\nstatic default policy presents significant challenges for agents when operating\nin unknown environment, that are not included in agent's prior knowledge. In\nthis work, we introduce a context-generative default policy that leverages the\nregion observed by the robot to predict unobserved part of the environment,\nthereby enabling the robot to adaptively adjust its default policy based on\nboth the actual observed map and the $\\textit{imagined}$ unobserved map.\nFurthermore, the adaptive nature of the bounded rationality framework enables\nthe robot to manage unreliable or incorrect imaginations by selectively\nsampling a few trajectories in the vicinity of the default policy. Our approach\nutilizes a diffusion model for map prediction and a sampling-based planning\nwith B-spline trajectory optimization to generate the default policy. Extensive\nevaluations reveal that the context-generative policy outperforms the baseline\nmethods in identifying and avoiding unseen obstacles. Additionally, real-world\nexperiments conducted with the Crazyflie drones demonstrate the adaptability of\nour proposed method, even when acting in environments outside the domain of the\ntraining distribution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
有界理性代理的情境生成默认策略
有界理性代理通常通过评估有限的选择来做出决策,这些选择通常来自于一个被称为"$$默认政策 "的参考点,该参考点基于以往的经验。然而,当代理在未知环境中工作时,静态默认政策的固有刚性给代理带来了巨大挑战,而这些环境并不包括在代理的先验知识中。在这项工作中,我们引入了一种情境生成默认策略,它利用机器人观察到的区域来预测环境中未观察到的部分,从而使机器人能够根据实际观察到的地图和未观察到的地图自适应地调整其默认策略。此外,有界理性框架的自适应性质使机器人能够通过选择性地采样默认策略附近的一些轨迹来管理不可靠或不正确的想象。我们的方法利用扩散模型进行地图预测,并利用基于采样的规划和 B 样条轨迹优化来生成默认策略。广泛的评估表明,情境生成策略在识别和避开未知障碍物方面优于基准方法。此外,使用 Crazyflie 无人机进行的真实世界实验证明了我们提出的方法的适应性,即使在训练分布领域之外的环境中也是如此。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition Human-Robot Cooperative Piano Playing with Learning-Based Real-Time Music Accompaniment GauTOAO: Gaussian-based Task-Oriented Affordance of Objects Reinforcement Learning with Lie Group Orientations for Robotics Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1