Effective Search for Control Hierarchies Within the Policy Decomposition Framework

IF 5.3 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2024-10-17 DOI:10.1109/LRA.2024.3483635

Ashwin Khadke;Hartmut Geyer

{"title":"Effective Search for Control Hierarchies Within the Policy Decomposition Framework","authors":"Ashwin Khadke;Hartmut Geyer","doi":"10.1109/LRA.2024.3483635","DOIUrl":null,"url":null,"abstract":"Policy decomposition is a novel framework for approximating optimal control policies of complex dynamical systems with a hierarchy of policies derived from smaller but tractable subsystems. It stands out amongst the class of hierarchical control methods by estimating \n<italic>a priori</i>\n how well the closed-loop behavior of different control hierarchies matches the optimal policy. However, the number of possible hierarchies grows prohibitively with the number of inputs and the dimension of the state-space of the system making it unrealistic to estimate the closed-loop performance for all hierarchies. Here, we present the development of two search methods based on Genetic Algorithm and Monte-Carlo Tree Search to tackle this combinatorial challenge, and demonstrate that it is indeed surmountable. We showcase the efficacy of our search methods and the generality of the framework by applying it towards finding hierarchies for control of three distinct robotic systems: a simplified biped, a planar manipulator, and a quadcopter. The discovered hierarchies, in comparison to heuristically designed ones, provide improved closed-loop performance or can be computed in minimal time with marginally worse control performance, and also exceed the control performance of policies obtained with popular deep reinforcement learning methods.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11114-11121"},"PeriodicalIF":5.3000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10721360/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Policy decomposition is a novel framework for approximating optimal control policies of complex dynamical systems with a hierarchy of policies derived from smaller but tractable subsystems. It stands out amongst the class of hierarchical control methods by estimating a priori how well the closed-loop behavior of different control hierarchies matches the optimal policy. However, the number of possible hierarchies grows prohibitively with the number of inputs and the dimension of the state-space of the system making it unrealistic to estimate the closed-loop performance for all hierarchies. Here, we present the development of two search methods based on Genetic Algorithm and Monte-Carlo Tree Search to tackle this combinatorial challenge, and demonstrate that it is indeed surmountable. We showcase the efficacy of our search methods and the generality of the framework by applying it towards finding hierarchies for control of three distinct robotic systems: a simplified biped, a planar manipulator, and a quadcopter. The discovered hierarchies, in comparison to heuristically designed ones, provide improved closed-loop performance or can be computed in minimal time with marginally worse control performance, and also exceed the control performance of policies obtained with popular deep reinforcement learning methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在政策分解框架内有效搜索控制层次结构

策略分解是一种新颖的框架，用于用从较小但可控的子系统中衍生出的策略层次来近似复杂动态系统的最优控制策略。它通过预先估计不同控制层次的闭环行为与最优策略的匹配程度，在众多层次控制方法中脱颖而出。然而，可能的层次结构数量会随着输入数量和系统状态空间维度的增加而急剧增加，因此估算所有层次结构的闭环性能是不现实的。在此，我们介绍了基于遗传算法和蒙特卡洛树搜索的两种搜索方法，以应对这一组合挑战，并证明它确实是可以克服的。我们将搜索方法应用于为三个不同的机器人系统（简化的双足机器人、平面机械手和四旋翼飞行器）的控制寻找层次结构，从而展示了搜索方法的有效性和框架的通用性。与启发式设计的层次结构相比，所发现的层次结构提高了闭环性能，或者可以在控制性能略差的情况下用最少的时间计算出来，而且还超过了用流行的深度强化学习方法获得的策略的控制性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.