ROSCOM：随机约束漫域上的稳健安全强化学习

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-07-31 DOI:10.1109/TASE.2024.3431530

Shangding Gu;Puze Liu;Alap Kshirsagar;Guang Chen;Jan Peters;Alois Knoll

{"title":"ROSCOM：随机约束漫域上的稳健安全强化学习","authors":"Shangding Gu;Puze Liu;Alap Kshirsagar;Guang Chen;Jan Peters;Alois Knoll","doi":"10.1109/TASE.2024.3431530","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning (RL) has demonstrated remarkable success across various domains. Nonetheless, a significant challenge in RL is to ensure safety, particularly when deploying it in safety-critical applications such as robotics and autonomous driving. In this work, we develop a robust and safe RL methodology grounded in manifold space. Initially, we construct a constrained manifold space, taking safety constraints into consideration. We then propose a robust safe RL approach, supported by theoretical analysis, based on the value at risk and conditional value at risk, in order to enhance the robustness of safety. Our methodology is designed to ensure safety within stochastic constraint environments. Following the theoretical analysis, we develop a practical, safe algorithm to search for a robust safe policy on stochastic constraint manifolds (ROSCOM). We evaluate the effectiveness of our approach through circular motion and air-hockey tasks. Our experiments demonstrate that ROSCOM outperforms existing baselines in terms of both reward and safety. Note to Practitioners—Real-world applications often involve inherent uncertainties, noise, and high-dimensional spaces. This complexity accentuates the urgency and challenge of ensuring safety in robot learning, especially when implementing RL in practical environments. To address this critical issue, we build a stochastic constraint manifold to delineate the safety space, thus establishing a rigorous framework for robot learning at each iteration. Compared with state-of-the-art baselines, our method can provide remarkable performance regarding safety and reward performance. For example, in an air hockey robot learning task, our method has demonstrated a remarkable 50% enhancement in safety performance compared to the ATACOM framework, while concurrently exhibiting superior reward performance. Moreover, in contrast to traditional algorithms, including CPO, PCPO, our method has achieved a 99% improvement in safety performance, coupled with significantly superior reward performance. These empirical insights render our approach not only theoretically sound but also practically efficacious, indicating its potential as a useful tool in real robot learning and beyond.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"5841-5851"},"PeriodicalIF":6.4000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ROSCOM: Robust Safe Reinforcement Learning on Stochastic Constraint Manifolds\",\"authors\":\"Shangding Gu;Puze Liu;Alap Kshirsagar;Guang Chen;Jan Peters;Alois Knoll\",\"doi\":\"10.1109/TASE.2024.3431530\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning (RL) has demonstrated remarkable success across various domains. Nonetheless, a significant challenge in RL is to ensure safety, particularly when deploying it in safety-critical applications such as robotics and autonomous driving. In this work, we develop a robust and safe RL methodology grounded in manifold space. Initially, we construct a constrained manifold space, taking safety constraints into consideration. We then propose a robust safe RL approach, supported by theoretical analysis, based on the value at risk and conditional value at risk, in order to enhance the robustness of safety. Our methodology is designed to ensure safety within stochastic constraint environments. Following the theoretical analysis, we develop a practical, safe algorithm to search for a robust safe policy on stochastic constraint manifolds (ROSCOM). We evaluate the effectiveness of our approach through circular motion and air-hockey tasks. Our experiments demonstrate that ROSCOM outperforms existing baselines in terms of both reward and safety. Note to Practitioners—Real-world applications often involve inherent uncertainties, noise, and high-dimensional spaces. This complexity accentuates the urgency and challenge of ensuring safety in robot learning, especially when implementing RL in practical environments. To address this critical issue, we build a stochastic constraint manifold to delineate the safety space, thus establishing a rigorous framework for robot learning at each iteration. Compared with state-of-the-art baselines, our method can provide remarkable performance regarding safety and reward performance. For example, in an air hockey robot learning task, our method has demonstrated a remarkable 50% enhancement in safety performance compared to the ATACOM framework, while concurrently exhibiting superior reward performance. Moreover, in contrast to traditional algorithms, including CPO, PCPO, our method has achieved a 99% improvement in safety performance, coupled with significantly superior reward performance. These empirical insights render our approach not only theoretically sound but also practically efficacious, indicating its potential as a useful tool in real robot learning and beyond.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"5841-5851\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10616119/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10616119/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

强化学习（RL）在各个领域都取得了显著的成功。然而，强化学习面临的一个重大挑战是确保安全，特别是在机器人和自动驾驶等安全关键应用中部署时。在这项工作中，我们开发了一种基于流形空间的稳健和安全的RL方法。首先，我们构造了一个考虑安全约束的约束流形空间。然后，我们提出了一种基于风险值和条件风险值的鲁棒安全RL方法，并以理论分析为支撑，以增强安全性的鲁棒性。我们的方法旨在确保随机约束环境下的安全性。在理论分析的基础上，我们开发了一种实用的、安全的算法来搜索随机约束流形（ROSCOM）上的鲁棒安全策略。我们通过圆周运动和空气曲棍球任务来评估我们方法的有效性。我们的实验表明，ROSCOM在奖励和安全性方面都优于现有的基线。从业人员注意事项—实际应用程序通常涉及固有的不确定性、噪声和高维空间。这种复杂性强调了确保机器人学习安全的紧迫性和挑战，特别是在实际环境中实施强化学习时。为了解决这个关键问题，我们建立了一个随机约束流形来描绘安全空间，从而建立了机器人每次迭代学习的严格框架。与最先进的基线相比，我们的方法在安全和奖励绩效方面可以提供显着的表现。例如，在一个空气曲棍球机器人学习任务中，我们的方法与ATACOM框架相比，在安全性能上提高了50%，同时表现出优越的奖励性能。此外，与传统算法（包括CPO、PCPO）相比，我们的方法在安全性能上提高了99%，并且奖励性能明显优于传统算法。这些经验见解使我们的方法不仅在理论上是合理的，而且在实践中是有效的，表明它有潜力成为真正的机器人学习和超越的有用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ROSCOM: Robust Safe Reinforcement Learning on Stochastic Constraint Manifolds

Reinforcement Learning (RL) has demonstrated remarkable success across various domains. Nonetheless, a significant challenge in RL is to ensure safety, particularly when deploying it in safety-critical applications such as robotics and autonomous driving. In this work, we develop a robust and safe RL methodology grounded in manifold space. Initially, we construct a constrained manifold space, taking safety constraints into consideration. We then propose a robust safe RL approach, supported by theoretical analysis, based on the value at risk and conditional value at risk, in order to enhance the robustness of safety. Our methodology is designed to ensure safety within stochastic constraint environments. Following the theoretical analysis, we develop a practical, safe algorithm to search for a robust safe policy on stochastic constraint manifolds (ROSCOM). We evaluate the effectiveness of our approach through circular motion and air-hockey tasks. Our experiments demonstrate that ROSCOM outperforms existing baselines in terms of both reward and safety. Note to Practitioners—Real-world applications often involve inherent uncertainties, noise, and high-dimensional spaces. This complexity accentuates the urgency and challenge of ensuring safety in robot learning, especially when implementing RL in practical environments. To address this critical issue, we build a stochastic constraint manifold to delineate the safety space, thus establishing a rigorous framework for robot learning at each iteration. Compared with state-of-the-art baselines, our method can provide remarkable performance regarding safety and reward performance. For example, in an air hockey robot learning task, our method has demonstrated a remarkable 50% enhancement in safety performance compared to the ATACOM framework, while concurrently exhibiting superior reward performance. Moreover, in contrast to traditional algorithms, including CPO, PCPO, our method has achieved a 99% improvement in safety performance, coupled with significantly superior reward performance. These empirical insights render our approach not only theoretically sound but also practically efficacious, indicating its potential as a useful tool in real robot learning and beyond.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.