利用深度强化学习实现主动悬架控制的仿真到真实传输

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Robotics and Autonomous Systems Pub Date : 2024-09-01 Epub Date: 2024-06-13 DOI:10.1016/j.robot.2024.104731

Viktor Wiberg , Erik Wallin , Arvid Fälldin , Tobias Semberg , Morgan Rossander , Eddie Wadbro , Martin Servin

{"title":"利用深度强化学习实现主动悬架控制的仿真到真实传输","authors":"Viktor Wiberg , Erik Wallin , Arvid Fälldin , Tobias Semberg , Morgan Rossander , Eddie Wadbro , Martin Servin","doi":"10.1016/j.robot.2024.104731","DOIUrl":null,"url":null,"abstract":"<div><p>We explore sim-to-real transfer of deep reinforcement learning controllers for a heavy vehicle with active suspensions designed for traversing rough terrain. While related research primarily focuses on lightweight robots with electric motors and fast actuation, this study uses a forestry vehicle with a complex hydraulic driveline and slow actuation. We simulate the vehicle using multibody dynamics and apply system identification to find an appropriate set of simulation parameters. We then train policies in simulation using various techniques to mitigate the sim-to-real gap, including domain randomization, action delays, and a reward penalty to encourage smooth control. In reality, the policies trained with action delays and a penalty for erratic actions perform nearly at the same level as in simulation. In experiments on level ground, the motion trajectories closely overlap when turning to either side, as well as in a route tracking scenario. When faced with a ramp that requires active use of the suspensions, the simulated and real motions are in close alignment. This shows that the actuator model together with system identification yields a sufficiently accurate model of the actuators. We observe that policies trained without the additional action penalty exhibit fast switching or bang–bang control. These present smooth motions and high performance in simulation but transfer poorly to reality. We find that policies make marginal use of the local height map for perception, showing no indications of predictive planning. However, the strong transfer capabilities entail that further development concerning perception and performance can be largely confined to simulation.</p></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"179 ","pages":"Article 104731"},"PeriodicalIF":5.2000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0921889024001155/pdfft?md5=e7bbe412bd07a5f03c52e1e36921e3d4&pid=1-s2.0-S0921889024001155-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Sim-to-real transfer of active suspension control using deep reinforcement learning\",\"authors\":\"Viktor Wiberg , Erik Wallin , Arvid Fälldin , Tobias Semberg , Morgan Rossander , Eddie Wadbro , Martin Servin\",\"doi\":\"10.1016/j.robot.2024.104731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We explore sim-to-real transfer of deep reinforcement learning controllers for a heavy vehicle with active suspensions designed for traversing rough terrain. While related research primarily focuses on lightweight robots with electric motors and fast actuation, this study uses a forestry vehicle with a complex hydraulic driveline and slow actuation. We simulate the vehicle using multibody dynamics and apply system identification to find an appropriate set of simulation parameters. We then train policies in simulation using various techniques to mitigate the sim-to-real gap, including domain randomization, action delays, and a reward penalty to encourage smooth control. In reality, the policies trained with action delays and a penalty for erratic actions perform nearly at the same level as in simulation. In experiments on level ground, the motion trajectories closely overlap when turning to either side, as well as in a route tracking scenario. When faced with a ramp that requires active use of the suspensions, the simulated and real motions are in close alignment. This shows that the actuator model together with system identification yields a sufficiently accurate model of the actuators. We observe that policies trained without the additional action penalty exhibit fast switching or bang–bang control. These present smooth motions and high performance in simulation but transfer poorly to reality. We find that policies make marginal use of the local height map for perception, showing no indications of predictive planning. However, the strong transfer capabilities entail that further development concerning perception and performance can be largely confined to simulation.</p></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"179 \",\"pages\":\"Article 104731\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0921889024001155/pdfft?md5=e7bbe412bd07a5f03c52e1e36921e3d4&pid=1-s2.0-S0921889024001155-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889024001155\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889024001155","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

我们探索了深度强化学习控制器从模拟到现实的转移，这种控制器适用于带有主动悬挂系统的重型车辆，专为穿越崎岖地形而设计。相关研究主要集中在配有电动马达和快速驱动装置的轻型机器人上，而本研究使用的是配有复杂液压传动系统和慢速驱动装置的林业车辆。我们使用多体动力学对车辆进行仿真，并应用系统识别来找到一组合适的仿真参数。然后，我们使用各种技术在模拟中训练策略，以缩小模拟与现实之间的差距，包括域随机化、动作延迟和奖励惩罚，以鼓励平稳控制。在现实中，使用动作延迟和对不稳定动作的惩罚来训练的策略与模拟中的表现几乎相同。在平地上进行的实验中，向两侧转弯时的运动轨迹与路线追踪场景中的运动轨迹紧密重叠。当遇到需要主动使用悬挂装置的坡道时，模拟运动轨迹与实际运动轨迹也非常接近。这表明，执行器模型与系统识别一起产生了足够精确的执行器模型。我们观察到，在没有额外动作惩罚的情况下训练出来的策略表现出快速切换或砰砰控制。这些策略在模拟中表现出平滑的运动和较高的性能，但在现实中却表现不佳。我们发现，这些策略对局部高度图的感知利用甚微，没有显示出预测规划的迹象。然而，强大的转移能力意味着有关感知和性能的进一步发展可以在很大程度上局限于模拟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Sim-to-real transfer of active suspension control using deep reinforcement learning

We explore sim-to-real transfer of deep reinforcement learning controllers for a heavy vehicle with active suspensions designed for traversing rough terrain. While related research primarily focuses on lightweight robots with electric motors and fast actuation, this study uses a forestry vehicle with a complex hydraulic driveline and slow actuation. We simulate the vehicle using multibody dynamics and apply system identification to find an appropriate set of simulation parameters. We then train policies in simulation using various techniques to mitigate the sim-to-real gap, including domain randomization, action delays, and a reward penalty to encourage smooth control. In reality, the policies trained with action delays and a penalty for erratic actions perform nearly at the same level as in simulation. In experiments on level ground, the motion trajectories closely overlap when turning to either side, as well as in a route tracking scenario. When faced with a ramp that requires active use of the suspensions, the simulated and real motions are in close alignment. This shows that the actuator model together with system identification yields a sufficiently accurate model of the actuators. We observe that policies trained without the additional action penalty exhibit fast switching or bang–bang control. These present smooth motions and high performance in simulation but transfer poorly to reality. We find that policies make marginal use of the local height map for perception, showing no indications of predictive planning. However, the strong transfer capabilities entail that further development concerning perception and performance can be largely confined to simulation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.