Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu
{"title":"Kernel-based diffusion approximated Markov decision processes for autonomous navigation and control on unstructured terrains","authors":"Junhong Xu, Kai Yin, Zheng Chen, Jason M Gregory, Ethan A Stump, Lantao Liu","doi":"10.1177/02783649231225977","DOIUrl":null,"url":null,"abstract":"We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"93 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Journal of Robotics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/02783649231225977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a diffusion approximation method to the continuous-state Markov decision processes that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2 D obstacle avoidance and 2.5 D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
我们提出了一种连续状态马尔可夫决策过程的扩散近似方法,可用于解决非结构化越野环境中的自主导航和控制问题。与大多数假设状态转换模型完全已知的决策理论规划框架不同,我们设计的方法消除了这种在现实中通常极难设计的强烈假设。我们首先对价值函数进行二阶泰勒展开。然后用偏微分方程逼近贝尔曼最优方程,该方程只依赖于过渡模型的第一和第二矩。通过结合价值函数的核表示,我们设计出了一种高效的策略迭代算法,其策略评估步骤可以表示为一个线性方程组,其特征是支持状态的有限集合。我们首先在 2 D 避障和 2.5 D 地形导航问题中进行了大量模拟,验证了所提出的方法。结果表明,所提出的方法比几种基线方法性能优越得多。然后,我们开发了一个系统,将我们的决策框架与车载感知集成在一起,并在杂乱的室内和非结构化的室外环境中进行了实际实验。物理系统的结果进一步证明了我们的方法在具有挑战性的现实环境中的适用性。