Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu
{"title":"有界理性代理的情境生成默认策略","authors":"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu","doi":"arxiv-2409.11604","DOIUrl":null,"url":null,"abstract":"Bounded rational agents often make decisions by evaluating a finite selection\nof choices, typically derived from a reference point termed the $`$default\npolicy,' based on previous experience. However, the inherent rigidity of the\nstatic default policy presents significant challenges for agents when operating\nin unknown environment, that are not included in agent's prior knowledge. In\nthis work, we introduce a context-generative default policy that leverages the\nregion observed by the robot to predict unobserved part of the environment,\nthereby enabling the robot to adaptively adjust its default policy based on\nboth the actual observed map and the $\\textit{imagined}$ unobserved map.\nFurthermore, the adaptive nature of the bounded rationality framework enables\nthe robot to manage unreliable or incorrect imaginations by selectively\nsampling a few trajectories in the vicinity of the default policy. Our approach\nutilizes a diffusion model for map prediction and a sampling-based planning\nwith B-spline trajectory optimization to generate the default policy. Extensive\nevaluations reveal that the context-generative policy outperforms the baseline\nmethods in identifying and avoiding unseen obstacles. Additionally, real-world\nexperiments conducted with the Crazyflie drones demonstrate the adaptability of\nour proposed method, even when acting in environments outside the domain of the\ntraining distribution.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-Generative Default Policy for Bounded Rational Agent\",\"authors\":\"Durgakant Pushp, Junhong Xu, Zheng Chen, Lantao Liu\",\"doi\":\"arxiv-2409.11604\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bounded rational agents often make decisions by evaluating a finite selection\\nof choices, typically derived from a reference point termed the $`$default\\npolicy,' based on previous experience. However, the inherent rigidity of the\\nstatic default policy presents significant challenges for agents when operating\\nin unknown environment, that are not included in agent's prior knowledge. In\\nthis work, we introduce a context-generative default policy that leverages the\\nregion observed by the robot to predict unobserved part of the environment,\\nthereby enabling the robot to adaptively adjust its default policy based on\\nboth the actual observed map and the $\\\\textit{imagined}$ unobserved map.\\nFurthermore, the adaptive nature of the bounded rationality framework enables\\nthe robot to manage unreliable or incorrect imaginations by selectively\\nsampling a few trajectories in the vicinity of the default policy. Our approach\\nutilizes a diffusion model for map prediction and a sampling-based planning\\nwith B-spline trajectory optimization to generate the default policy. Extensive\\nevaluations reveal that the context-generative policy outperforms the baseline\\nmethods in identifying and avoiding unseen obstacles. Additionally, real-world\\nexperiments conducted with the Crazyflie drones demonstrate the adaptability of\\nour proposed method, even when acting in environments outside the domain of the\\ntraining distribution.\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11604\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
有界理性代理通常通过评估有限的选择来做出决策,这些选择通常来自于一个被称为"$$默认政策 "的参考点,该参考点基于以往的经验。然而,当代理在未知环境中工作时,静态默认政策的固有刚性给代理带来了巨大挑战,而这些环境并不包括在代理的先验知识中。在这项工作中,我们引入了一种情境生成默认策略,它利用机器人观察到的区域来预测环境中未观察到的部分,从而使机器人能够根据实际观察到的地图和未观察到的地图自适应地调整其默认策略。此外,有界理性框架的自适应性质使机器人能够通过选择性地采样默认策略附近的一些轨迹来管理不可靠或不正确的想象。我们的方法利用扩散模型进行地图预测,并利用基于采样的规划和 B 样条轨迹优化来生成默认策略。广泛的评估表明,情境生成策略在识别和避开未知障碍物方面优于基准方法。此外,使用 Crazyflie 无人机进行的真实世界实验证明了我们提出的方法的适应性,即使在训练分布领域之外的环境中也是如此。
Context-Generative Default Policy for Bounded Rational Agent
Bounded rational agents often make decisions by evaluating a finite selection
of choices, typically derived from a reference point termed the $`$default
policy,' based on previous experience. However, the inherent rigidity of the
static default policy presents significant challenges for agents when operating
in unknown environment, that are not included in agent's prior knowledge. In
this work, we introduce a context-generative default policy that leverages the
region observed by the robot to predict unobserved part of the environment,
thereby enabling the robot to adaptively adjust its default policy based on
both the actual observed map and the $\textit{imagined}$ unobserved map.
Furthermore, the adaptive nature of the bounded rationality framework enables
the robot to manage unreliable or incorrect imaginations by selectively
sampling a few trajectories in the vicinity of the default policy. Our approach
utilizes a diffusion model for map prediction and a sampling-based planning
with B-spline trajectory optimization to generate the default policy. Extensive
evaluations reveal that the context-generative policy outperforms the baseline
methods in identifying and avoiding unseen obstacles. Additionally, real-world
experiments conducted with the Crazyflie drones demonstrate the adaptability of
our proposed method, even when acting in environments outside the domain of the
training distribution.