Lei Xia;Yunduan Cui;Zhengkun Yi;Huiyun Li;Xinyu Wu
{"title":"估计基于模型的鲁棒强化学习 USV 的 Lyapunov 吸引区域","authors":"Lei Xia;Yunduan Cui;Zhengkun Yi;Huiyun Li;Xinyu Wu","doi":"10.1109/TASE.2024.3492174","DOIUrl":null,"url":null,"abstract":"This article addresses the robustness of unmanned surface vehicles (USV) using model-based reinforcement learning (MBRL). A novel MBRL approach, Lyapunov probabilistic model predictive control (LPMPC) is proposed to simultaneously learn both the probabilistic model of a USV and its corresponding estimated Lyapunov region of attraction (ROA) under one reinforcement learning framework. Unlike the existing MBRL USV systems with less consideration of robustness and safety, our method naturally learns a general indicator of system stability based on the probabilistic model’s belief and employs it to guide its policy. Evaluated by different navigation tasks in a simulation driven by real boat data, LPMPC demonstrated significant advantages in both control robustness and task completion against various levels of environmental disturbances compared with the baseline approach without Lyapunov ROA’s guidance. Note to Practitioners—Modelling the system stability without human prior knowledge is challenging in the domain of USV. This work proposed a data-driven method to iteratively learn a task-relevant stability model of USV in a probabilistic view. Based on the evaluation of a real boat data-driven simulation, the learned stability model contributed to superior driving skills in different USV scenarios by properly indicating and avoiding potentially risky states. In future research, we plan to expand the definition of risks in different tasks, such as loss of control, overlarge sway, and excessive energy consumption and investigate the proposed approach in real-world USV.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8898-8911"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating Lyapunov Region of Attraction for Robust Model-Based Reinforcement Learning USV\",\"authors\":\"Lei Xia;Yunduan Cui;Zhengkun Yi;Huiyun Li;Xinyu Wu\",\"doi\":\"10.1109/TASE.2024.3492174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article addresses the robustness of unmanned surface vehicles (USV) using model-based reinforcement learning (MBRL). A novel MBRL approach, Lyapunov probabilistic model predictive control (LPMPC) is proposed to simultaneously learn both the probabilistic model of a USV and its corresponding estimated Lyapunov region of attraction (ROA) under one reinforcement learning framework. Unlike the existing MBRL USV systems with less consideration of robustness and safety, our method naturally learns a general indicator of system stability based on the probabilistic model’s belief and employs it to guide its policy. Evaluated by different navigation tasks in a simulation driven by real boat data, LPMPC demonstrated significant advantages in both control robustness and task completion against various levels of environmental disturbances compared with the baseline approach without Lyapunov ROA’s guidance. Note to Practitioners—Modelling the system stability without human prior knowledge is challenging in the domain of USV. This work proposed a data-driven method to iteratively learn a task-relevant stability model of USV in a probabilistic view. Based on the evaluation of a real boat data-driven simulation, the learned stability model contributed to superior driving skills in different USV scenarios by properly indicating and avoiding potentially risky states. In future research, we plan to expand the definition of risks in different tasks, such as loss of control, overlarge sway, and excessive energy consumption and investigate the proposed approach in real-world USV.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"8898-8911\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10750444/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10750444/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
本文利用基于模型的强化学习(MBRL)解决了无人水面车辆(USV)的鲁棒性问题。提出了一种新的MBRL方法——李雅普诺夫概率模型预测控制(Lyapunov probabilistic model predictive control, LPMPC),该方法在一个强化学习框架下同时学习USV的概率模型及其相应的估计Lyapunov吸引区(ROA)。与现有的对鲁棒性和安全性考虑较少的MBRL USV系统不同,我们的方法基于概率模型的信念自然地学习了系统稳定性的一般指标,并用它来指导其策略。在真实船舶数据驱动的模拟中,通过对不同导航任务的评估,与没有Lyapunov ROA指导的基线方法相比,LPMPC在控制鲁棒性和任务完成度方面都具有显著优势。从业人员注意:在无人驾驶领域,没有人类先验知识的系统稳定性建模是具有挑战性的。本文提出了一种数据驱动的方法,在概率视图下迭代学习USV的任务相关稳定性模型。基于真实船只数据驱动仿真的评估,学习稳定性模型通过正确指示和避免潜在的危险状态,有助于提高不同USV场景下的驾驶技能。在未来的研究中,我们计划在不同的任务中扩展风险的定义,如失控、过大的摇摆和过度的能量消耗,并在现实世界的USV中研究所提出的方法。
Estimating Lyapunov Region of Attraction for Robust Model-Based Reinforcement Learning USV
This article addresses the robustness of unmanned surface vehicles (USV) using model-based reinforcement learning (MBRL). A novel MBRL approach, Lyapunov probabilistic model predictive control (LPMPC) is proposed to simultaneously learn both the probabilistic model of a USV and its corresponding estimated Lyapunov region of attraction (ROA) under one reinforcement learning framework. Unlike the existing MBRL USV systems with less consideration of robustness and safety, our method naturally learns a general indicator of system stability based on the probabilistic model’s belief and employs it to guide its policy. Evaluated by different navigation tasks in a simulation driven by real boat data, LPMPC demonstrated significant advantages in both control robustness and task completion against various levels of environmental disturbances compared with the baseline approach without Lyapunov ROA’s guidance. Note to Practitioners—Modelling the system stability without human prior knowledge is challenging in the domain of USV. This work proposed a data-driven method to iteratively learn a task-relevant stability model of USV in a probabilistic view. Based on the evaluation of a real boat data-driven simulation, the learned stability model contributed to superior driving skills in different USV scenarios by properly indicating and avoiding potentially risky states. In future research, we plan to expand the definition of risks in different tasks, such as loss of control, overlarge sway, and excessive energy consumption and investigate the proposed approach in real-world USV.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.