Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems

arXiv - EE - Systems and Control Pub Date : 2024-09-17 DOI:arxiv-2409.11238

Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar

{"title":"Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems","authors":"Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar","doi":"arxiv-2409.11238","DOIUrl":null,"url":null,"abstract":"Tracking controllers enable robotic systems to accurately follow planned\nreference trajectories. In particular, reinforcement learning (RL) has shown\npromise in the synthesis of controllers for systems with complex dynamics and\nmodest online compute budgets. However, the poor sample efficiency of RL and\nthe challenges of reward design make training slow and sometimes unstable,\nespecially for high-dimensional systems. In this work, we leverage the inherent\nLie group symmetries of robotic systems with a floating base to mitigate these\nchallenges when learning tracking controllers. We model a general tracking\nproblem as a Markov decision process (MDP) that captures the evolution of both\nthe physical and reference states. Next, we prove that symmetry in the\nunderlying dynamics and running costs leads to an MDP homomorphism, a mapping\nthat allows a policy trained on a lower-dimensional \"quotient\" MDP to be lifted\nto an optimal tracking controller for the original system. We compare this\nsymmetry-informed approach to an unstructured baseline, using Proximal Policy\nOptimization (PPO) to learn tracking controllers for three systems: the\nParticle (a forced point mass), the Astrobee (a fullyactuated space robot), and\nthe Quadrotor (an underactuated system). Results show that a symmetry-aware\napproach both accelerates training and reduces tracking error after the same\nnumber of training steps.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fullyactuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用对称性加速学习自由飞行机器人系统的轨迹跟踪控制器

跟踪控制器能使机器人系统准确地跟踪计划的参考轨迹。特别是，强化学习（RL）在为具有复杂动力学特性和最低廉在线计算预算的系统合成控制器方面大有可为。然而，RL 的采样效率低和奖励设计的挑战使得训练速度缓慢，有时甚至不稳定，尤其是对于高维系统。在这项工作中，我们利用具有浮动基底的机器人系统固有的李群对称性来缓解学习跟踪控制器时遇到的这些挑战。我们将一般跟踪问题建模为马尔可夫决策过程（MDP），该过程捕捉了物理状态和参考状态的演变。接下来，我们证明了底层动力学和运行成本的对称性会导致 MDP 同构，这种映射允许将在低维 "商 "MDP 上训练的策略提升为原始系统的最优跟踪控制器。我们将这种考虑对称性的方法与非结构化基线进行了比较，使用近端策略优化（PPO）来学习三个系统的跟踪控制器：Particle（受迫点质量）、Astrobee（全致动太空机器人）和Quadrotor（欠致动系统）。结果表明，对称感知方法既能加快训练速度，又能在相同训练步数后减少跟踪误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - EE - Systems and Control

自引率

0.00%

发文量