{"title":"利用对称性加速学习自由飞行机器人系统的轨迹跟踪控制器","authors":"Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar","doi":"arxiv-2409.11238","DOIUrl":null,"url":null,"abstract":"Tracking controllers enable robotic systems to accurately follow planned\nreference trajectories. In particular, reinforcement learning (RL) has shown\npromise in the synthesis of controllers for systems with complex dynamics and\nmodest online compute budgets. However, the poor sample efficiency of RL and\nthe challenges of reward design make training slow and sometimes unstable,\nespecially for high-dimensional systems. In this work, we leverage the inherent\nLie group symmetries of robotic systems with a floating base to mitigate these\nchallenges when learning tracking controllers. We model a general tracking\nproblem as a Markov decision process (MDP) that captures the evolution of both\nthe physical and reference states. Next, we prove that symmetry in the\nunderlying dynamics and running costs leads to an MDP homomorphism, a mapping\nthat allows a policy trained on a lower-dimensional \"quotient\" MDP to be lifted\nto an optimal tracking controller for the original system. We compare this\nsymmetry-informed approach to an unstructured baseline, using Proximal Policy\nOptimization (PPO) to learn tracking controllers for three systems: the\nParticle (a forced point mass), the Astrobee (a fullyactuated space robot), and\nthe Quadrotor (an underactuated system). Results show that a symmetry-aware\napproach both accelerates training and reduces tracking error after the same\nnumber of training steps.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems\",\"authors\":\"Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar\",\"doi\":\"arxiv-2409.11238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tracking controllers enable robotic systems to accurately follow planned\\nreference trajectories. In particular, reinforcement learning (RL) has shown\\npromise in the synthesis of controllers for systems with complex dynamics and\\nmodest online compute budgets. However, the poor sample efficiency of RL and\\nthe challenges of reward design make training slow and sometimes unstable,\\nespecially for high-dimensional systems. In this work, we leverage the inherent\\nLie group symmetries of robotic systems with a floating base to mitigate these\\nchallenges when learning tracking controllers. We model a general tracking\\nproblem as a Markov decision process (MDP) that captures the evolution of both\\nthe physical and reference states. Next, we prove that symmetry in the\\nunderlying dynamics and running costs leads to an MDP homomorphism, a mapping\\nthat allows a policy trained on a lower-dimensional \\\"quotient\\\" MDP to be lifted\\nto an optimal tracking controller for the original system. We compare this\\nsymmetry-informed approach to an unstructured baseline, using Proximal Policy\\nOptimization (PPO) to learn tracking controllers for three systems: the\\nParticle (a forced point mass), the Astrobee (a fullyactuated space robot), and\\nthe Quadrotor (an underactuated system). Results show that a symmetry-aware\\napproach both accelerates training and reduces tracking error after the same\\nnumber of training steps.\",\"PeriodicalId\":501175,\"journal\":{\"name\":\"arXiv - EE - Systems and Control\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Systems and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems
Tracking controllers enable robotic systems to accurately follow planned
reference trajectories. In particular, reinforcement learning (RL) has shown
promise in the synthesis of controllers for systems with complex dynamics and
modest online compute budgets. However, the poor sample efficiency of RL and
the challenges of reward design make training slow and sometimes unstable,
especially for high-dimensional systems. In this work, we leverage the inherent
Lie group symmetries of robotic systems with a floating base to mitigate these
challenges when learning tracking controllers. We model a general tracking
problem as a Markov decision process (MDP) that captures the evolution of both
the physical and reference states. Next, we prove that symmetry in the
underlying dynamics and running costs leads to an MDP homomorphism, a mapping
that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted
to an optimal tracking controller for the original system. We compare this
symmetry-informed approach to an unstructured baseline, using Proximal Policy
Optimization (PPO) to learn tracking controllers for three systems: the
Particle (a forced point mass), the Astrobee (a fullyactuated space robot), and
the Quadrotor (an underactuated system). Results show that a symmetry-aware
approach both accelerates training and reduces tracking error after the same
number of training steps.