Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems

Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar
{"title":"Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems","authors":"Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar","doi":"arxiv-2409.11238","DOIUrl":null,"url":null,"abstract":"Tracking controllers enable robotic systems to accurately follow planned\nreference trajectories. In particular, reinforcement learning (RL) has shown\npromise in the synthesis of controllers for systems with complex dynamics and\nmodest online compute budgets. However, the poor sample efficiency of RL and\nthe challenges of reward design make training slow and sometimes unstable,\nespecially for high-dimensional systems. In this work, we leverage the inherent\nLie group symmetries of robotic systems with a floating base to mitigate these\nchallenges when learning tracking controllers. We model a general tracking\nproblem as a Markov decision process (MDP) that captures the evolution of both\nthe physical and reference states. Next, we prove that symmetry in the\nunderlying dynamics and running costs leads to an MDP homomorphism, a mapping\nthat allows a policy trained on a lower-dimensional \"quotient\" MDP to be lifted\nto an optimal tracking controller for the original system. We compare this\nsymmetry-informed approach to an unstructured baseline, using Proximal Policy\nOptimization (PPO) to learn tracking controllers for three systems: the\nParticle (a forced point mass), the Astrobee (a fullyactuated space robot), and\nthe Quadrotor (an underactuated system). Results show that a symmetry-aware\napproach both accelerates training and reduces tracking error after the same\nnumber of training steps.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fullyactuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用对称性加速学习自由飞行机器人系统的轨迹跟踪控制器
跟踪控制器能使机器人系统准确地跟踪计划的参考轨迹。特别是,强化学习(RL)在为具有复杂动力学特性和最低廉在线计算预算的系统合成控制器方面大有可为。然而,RL 的采样效率低和奖励设计的挑战使得训练速度缓慢,有时甚至不稳定,尤其是对于高维系统。在这项工作中,我们利用具有浮动基底的机器人系统固有的李群对称性来缓解学习跟踪控制器时遇到的这些挑战。我们将一般跟踪问题建模为马尔可夫决策过程(MDP),该过程捕捉了物理状态和参考状态的演变。接下来,我们证明了底层动力学和运行成本的对称性会导致 MDP 同构,这种映射允许将在低维 "商 "MDP 上训练的策略提升为原始系统的最优跟踪控制器。我们将这种考虑对称性的方法与非结构化基线进行了比较,使用近端策略优化(PPO)来学习三个系统的跟踪控制器:Particle(受迫点质量)、Astrobee(全致动太空机器人)和Quadrotor(欠致动系统)。结果表明,对称感知方法既能加快训练速度,又能在相同训练步数后减少跟踪误差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Data-Efficient Quadratic Q-Learning Using LMIs On the Stability of Consensus Control under Rotational Ambiguities System-Level Efficient Performance of EMLA-Driven Heavy-Duty Manipulators via Bilevel Optimization Framework with a Leader--Follower Scenario ReLU Surrogates in Mixed-Integer MPC for Irrigation Scheduling Model-Free Generic Robust Control for Servo-Driven Actuation Mechanisms with Experimental Verification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1