LeTO：利用可变轨迹优化学习受限视觉运动策略

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-10-31 DOI:10.1109/TASE.2024.3486542

Zhengtong Xu;Yu She

{"title":"LeTO：利用可变轨迹优化学习受限视觉运动策略","authors":"Zhengtong Xu;Yu She","doi":"10.1109/TASE.2024.3486542","DOIUrl":null,"url":null,"abstract":"This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at <uri>https://github.com/ZhengtongXu/LeTO</uri>. Note to Practitioners—LeTO is driven by the goal of developing an imitation learning algorithm capable of generating safe and constraint-satisfying robotic behaviors. The idea of imitation learning is to enable the robot to learn from human demonstrations of certain tasks. Subsequently, the robot is able to autonomously perform the learned tasks on its own. Thanks to the powerful representational and fitting capabilities of neural networks, imitation learning can let robots perform complex manipulation tasks. However, neural networks often exhibit a certain level of uncertainty and lack theoretical safety guarantees. For robotic systems, it is crucial that robot behaviors meet specific constraints; otherwise, the system may not be sufficiently reliable. Therefore, we introduce LeTO, an approach that integrates trajectory optimization with neural networks to generate actions that not only achieve manipulation tasks, but also comply with constraints. This improves the interpretability, safety, and reliability of robot policies acquired through imitation learning, facilitating their deployment in scenarios with high safety requirements.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8567-8578"},"PeriodicalIF":6.4000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LeTO: Learning Constrained Visuomotor Policy With Differentiable Trajectory Optimization\",\"authors\":\"Zhengtong Xu;Yu She\",\"doi\":\"10.1109/TASE.2024.3486542\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at <uri>https://github.com/ZhengtongXu/LeTO</uri>. Note to Practitioners—LeTO is driven by the goal of developing an imitation learning algorithm capable of generating safe and constraint-satisfying robotic behaviors. The idea of imitation learning is to enable the robot to learn from human demonstrations of certain tasks. Subsequently, the robot is able to autonomously perform the learned tasks on its own. Thanks to the powerful representational and fitting capabilities of neural networks, imitation learning can let robots perform complex manipulation tasks. However, neural networks often exhibit a certain level of uncertainty and lack theoretical safety guarantees. For robotic systems, it is crucial that robot behaviors meet specific constraints; otherwise, the system may not be sufficiently reliable. Therefore, we introduce LeTO, an approach that integrates trajectory optimization with neural networks to generate actions that not only achieve manipulation tasks, but also comply with constraints. This improves the interpretability, safety, and reliability of robot policies acquired through imitation learning, facilitating their deployment in scenarios with high safety requirements.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"8567-8578\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2024-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10740461/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10740461/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种具有可微轨迹优化的约束视觉运动策略学习方法LeTO。我们的方法将一个可微优化层集成到神经网络中。通过将优化层表述为轨迹优化问题，我们使模型能够以安全和约束控制的方式端到端生成动作，而无需额外的模块。我们的方法允许在训练过程中引入约束信息，从而平衡满足约束的训练目标，平滑轨迹，并通过演示最小化误差。这种“灰盒”方法将基于优化的安全性和可解释性与神经网络强大的表示能力相结合。我们在仿真和真实机器人中对LeTO进行了定量评价。结果表明，LeTO在模拟和现实任务中都表现良好。此外，与现有的模仿学习方法相比，它能够生成不确定性更小、质量更高、更平滑的轨迹。因此，LeTO为如何实现神经网络与轨迹优化的集成提供了一个实际的例子。我们在https://github.com/ZhengtongXu/LeTO上发布我们的代码。从业人员注意事项- leto的目标是开发一种能够生成安全和满足约束的机器人行为的模仿学习算法。模仿学习的思想是使机器人能够从人类对某些任务的演示中学习。随后，机器人能够自主执行学习到的任务。由于神经网络强大的表征和拟合能力，模仿学习可以让机器人执行复杂的操作任务。然而，神经网络往往表现出一定程度的不确定性，缺乏理论上的安全保证。对于机器人系统，机器人行为满足特定约束是至关重要的；否则，系统可能不够可靠。因此，我们引入了LeTO，一种将轨迹优化与神经网络相结合的方法，以生成既能完成操作任务又能遵守约束的动作。这提高了通过模仿学习获得的机器人策略的可解释性、安全性和可靠性，便于在高安全要求的场景中部署。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LeTO: Learning Constrained Visuomotor Policy With Differentiable Trajectory Optimization

This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO. Note to Practitioners—LeTO is driven by the goal of developing an imitation learning algorithm capable of generating safe and constraint-satisfying robotic behaviors. The idea of imitation learning is to enable the robot to learn from human demonstrations of certain tasks. Subsequently, the robot is able to autonomously perform the learned tasks on its own. Thanks to the powerful representational and fitting capabilities of neural networks, imitation learning can let robots perform complex manipulation tasks. However, neural networks often exhibit a certain level of uncertainty and lack theoretical safety guarantees. For robotic systems, it is crucial that robot behaviors meet specific constraints; otherwise, the system may not be sufficiently reliable. Therefore, we introduce LeTO, an approach that integrates trajectory optimization with neural networks to generate actions that not only achieve manipulation tasks, but also comply with constraints. This improves the interpretability, safety, and reliability of robot policies acquired through imitation learning, facilitating their deployment in scenarios with high safety requirements.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.