Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems

IF 3.7 2区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS Nonlinear Analysis-Hybrid Systems Pub Date : 2024-02-01 DOI:10.1016/j.nahs.2024.101468

Quentin Le Lidec , Fabian Schramm , Louis Montaut , Cordelia Schmid , Ivan Laptev , Justin Carpentier

{"title":"Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems","authors":"Quentin Le Lidec , Fabian Schramm , Louis Montaut , Cordelia Schmid , Ivan Laptev , Justin Carpentier","doi":"10.1016/j.nahs.2024.101468","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>Optimal control<span> (OC) algorithms such as differential dynamic programming (DDP) take advantage of the derivatives of the dynamics to control physical systems efficiently. Yet, these algorithms are prone to failure when dealing with non-smooth dynamical systems. This can be attributed to factors such as the existence of discontinuities in the dynamics derivatives or the presence of non-informative gradients. On the contrary, reinforcement learning (RL) algorithms have shown better empirical results in scenarios exhibiting non-smooth effects (contacts, frictions, etc.). Our approach leverages recent works on randomized smoothing (RS) to tackle non-smoothness issues commonly encountered in optimal control and provides key insights on the interplay between RL and OC through the prism of RS methods. This naturally leads us to introduce the randomized Differential Dynamic Programming (RDDP) algorithm accounting for deterministic but non-smooth dynamics in a very sample-efficient way. The experiments demonstrate that our method can solve classic robotic problems with </span></span>dry friction and </span>frictional contacts, where classical OC algorithms are likely to fail, and RL algorithms require, in practice, a prohibitive number of samples to find an optimal solution.</p></div>","PeriodicalId":49011,"journal":{"name":"Nonlinear Analysis-Hybrid Systems","volume":"52 ","pages":"Article 101468"},"PeriodicalIF":3.7000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Analysis-Hybrid Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751570X24000050","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Optimal control (OC) algorithms such as differential dynamic programming (DDP) take advantage of the derivatives of the dynamics to control physical systems efficiently. Yet, these algorithms are prone to failure when dealing with non-smooth dynamical systems. This can be attributed to factors such as the existence of discontinuities in the dynamics derivatives or the presence of non-informative gradients. On the contrary, reinforcement learning (RL) algorithms have shown better empirical results in scenarios exhibiting non-smooth effects (contacts, frictions, etc.). Our approach leverages recent works on randomized smoothing (RS) to tackle non-smoothness issues commonly encountered in optimal control and provides key insights on the interplay between RL and OC through the prism of RS methods. This naturally leads us to introduce the randomized Differential Dynamic Programming (RDDP) algorithm accounting for deterministic but non-smooth dynamics in a very sample-efficient way. The experiments demonstrate that our method can solve classic robotic problems with dry friction and frictional contacts, where classical OC algorithms are likely to fail, and RL algorithms require, in practice, a prohibitive number of samples to find an optimal solution.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用随机平滑技术实现非平滑动力系统的优化控制

最优控制 (OC) 算法（如微分动态编程 (DDP)）利用动态的导数来有效控制物理系统。然而，这些算法在处理非光滑动态系统时容易失效。这可能是由于动力学导数存在不连续性或存在非信息梯度等因素造成的。相反，强化学习（RL）算法在表现出非光滑效应（接触、摩擦等）的场景中显示出更好的经验结果。我们的方法利用随机平滑（RS）方面的最新研究成果来解决最优控制中常见的非平滑性问题，并通过 RS 方法的棱镜为 RL 和 OC 之间的相互作用提供了重要见解。由此，我们自然而然地引入了随机差分动态编程（RDDP）算法，该算法以非常高效的抽样方式计算确定性但非平滑的动态。实验证明，我们的方法可以解决具有干摩擦和摩擦接触的经典机器人问题，在这些问题中，经典的 OC 算法很可能会失败，而 RL 算法实际上需要过多的样本才能找到最优解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Nonlinear Analysis-Hybrid Systems AUTOMATION & CONTROL SYSTEMS-MATHEMATICS, APPLIED

CiteScore

8.30

自引率

9.50%

发文量

审稿时长

>12 weeks

期刊介绍： Nonlinear Analysis: Hybrid Systems welcomes all important research and expository papers in any discipline. Papers that are principally concerned with the theory of hybrid systems should contain significant results indicating relevant applications. Papers that emphasize applications should consist of important real world models and illuminating techniques. Papers that interrelate various aspects of hybrid systems will be most welcome.