贪婪的匕首-一个学生推出的高效模仿学习算法

IF 5.3 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-01-29 DOI:10.1109/LRA.2025.3536297

Mitchell Torok;Mohammad Deghat;Yang Song

{"title":"贪婪的匕首-一个学生推出的高效模仿学习算法","authors":"Mitchell Torok;Mohammad Deghat;Yang Song","doi":"10.1109/LRA.2025.3536297","DOIUrl":null,"url":null,"abstract":"Sampling-based model predictive control algorithms can be computationally expensive and may not be feasible for restricted platforms such as quadcopters. Comparatively speaking, lightweight learned controllers are computationally cheaper and may be more suited for these platforms. Expert control samples provided by a remote model predictive control algorithm could be used to rapidly train a student policy. We present Greedy-DAgger, a hybrid-policy imitation learning approach that leverages expert simulations to improve the student rollout efficiency during the training of a student policy. Our approach builds on the DAgger algorithm by employing a greedy strategy, that selects isolated states from a student trajectory. These states are used to generate expert trajectory samples before supervised learning is performed and the process is repeated. The effectiveness of the Greedy-DAgger algorithm is evaluated on two simulated robotic systems: a cart pole and a quadcopter. For these environments, Greedy-DAgger was shown to be up to ten times more rollout efficient than conventional DAgger. The introduced improvements enable expert-level quadcopter control to be achieved within 8 seconds of wall time. The Crazyflie quadcopter platform was then utilised to validate the simulation results and demonstrate the potential for real-world training with Greedy-DAgger on a constrained platform, leveraging access to a remote GPU-accelerated server.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2878-2885"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Greedy-DAgger - A Student Rollout Efficient Imitation Learning Algorithm\",\"authors\":\"Mitchell Torok;Mohammad Deghat;Yang Song\",\"doi\":\"10.1109/LRA.2025.3536297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sampling-based model predictive control algorithms can be computationally expensive and may not be feasible for restricted platforms such as quadcopters. Comparatively speaking, lightweight learned controllers are computationally cheaper and may be more suited for these platforms. Expert control samples provided by a remote model predictive control algorithm could be used to rapidly train a student policy. We present Greedy-DAgger, a hybrid-policy imitation learning approach that leverages expert simulations to improve the student rollout efficiency during the training of a student policy. Our approach builds on the DAgger algorithm by employing a greedy strategy, that selects isolated states from a student trajectory. These states are used to generate expert trajectory samples before supervised learning is performed and the process is repeated. The effectiveness of the Greedy-DAgger algorithm is evaluated on two simulated robotic systems: a cart pole and a quadcopter. For these environments, Greedy-DAgger was shown to be up to ten times more rollout efficient than conventional DAgger. The introduced improvements enable expert-level quadcopter control to be achieved within 8 seconds of wall time. The Crazyflie quadcopter platform was then utilised to validate the simulation results and demonstrate the potential for real-world training with Greedy-DAgger on a constrained platform, leveraging access to a remote GPU-accelerated server.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 3\",\"pages\":\"2878-2885\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10857457/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10857457/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

基于采样的模型预测控制算法可能在计算上很昂贵，并且可能不适用于诸如四轴飞行器之类的受限平台。相对而言，轻量级的学习控制器在计算上更便宜，可能更适合这些平台。由远程模型预测控制算法提供的专家控制样本可用于快速训练学生策略。我们提出了一种混合策略模仿学习方法Greedy-DAgger，它利用专家模拟来提高学生策略训练期间的学生部署效率。我们的方法建立在DAgger算法的基础上，采用贪婪策略，从学生轨迹中选择孤立的状态。在执行监督学习之前，这些状态用于生成专家轨迹样本，并重复该过程。在推车杆和四轴飞行器两种仿真机器人系统上对Greedy-DAgger算法的有效性进行了评价。在这些环境中，Greedy-DAgger的部署效率是传统DAgger的10倍。引入的改进使专家级别的四轴飞行器控制在8秒的壁时间内实现。然后利用crazyfly四轴飞行器平台验证仿真结果，并在受限的平台上展示贪婪- dagger在现实世界中训练的潜力，利用对远程gpu加速服务器的访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Greedy-DAgger - A Student Rollout Efficient Imitation Learning Algorithm

Sampling-based model predictive control algorithms can be computationally expensive and may not be feasible for restricted platforms such as quadcopters. Comparatively speaking, lightweight learned controllers are computationally cheaper and may be more suited for these platforms. Expert control samples provided by a remote model predictive control algorithm could be used to rapidly train a student policy. We present Greedy-DAgger, a hybrid-policy imitation learning approach that leverages expert simulations to improve the student rollout efficiency during the training of a student policy. Our approach builds on the DAgger algorithm by employing a greedy strategy, that selects isolated states from a student trajectory. These states are used to generate expert trajectory samples before supervised learning is performed and the process is repeated. The effectiveness of the Greedy-DAgger algorithm is evaluated on two simulated robotic systems: a cart pole and a quadcopter. For these environments, Greedy-DAgger was shown to be up to ten times more rollout efficient than conventional DAgger. The introduced improvements enable expert-level quadcopter control to be achieved within 8 seconds of wall time. The Crazyflie quadcopter platform was then utilised to validate the simulation results and demonstrate the potential for real-world training with Greedy-DAgger on a constrained platform, leveraging access to a remote GPU-accelerated server.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.