Estimating Lyapunov Region of Attraction for Robust Model-Based Reinforcement Learning USV

IF 6.4 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-11-12 DOI:10.1109/TASE.2024.3492174
Lei Xia;Yunduan Cui;Zhengkun Yi;Huiyun Li;Xinyu Wu
{"title":"Estimating Lyapunov Region of Attraction for Robust Model-Based Reinforcement Learning USV","authors":"Lei Xia;Yunduan Cui;Zhengkun Yi;Huiyun Li;Xinyu Wu","doi":"10.1109/TASE.2024.3492174","DOIUrl":null,"url":null,"abstract":"This article addresses the robustness of unmanned surface vehicles (USV) using model-based reinforcement learning (MBRL). A novel MBRL approach, Lyapunov probabilistic model predictive control (LPMPC) is proposed to simultaneously learn both the probabilistic model of a USV and its corresponding estimated Lyapunov region of attraction (ROA) under one reinforcement learning framework. Unlike the existing MBRL USV systems with less consideration of robustness and safety, our method naturally learns a general indicator of system stability based on the probabilistic model’s belief and employs it to guide its policy. Evaluated by different navigation tasks in a simulation driven by real boat data, LPMPC demonstrated significant advantages in both control robustness and task completion against various levels of environmental disturbances compared with the baseline approach without Lyapunov ROA’s guidance. Note to Practitioners—Modelling the system stability without human prior knowledge is challenging in the domain of USV. This work proposed a data-driven method to iteratively learn a task-relevant stability model of USV in a probabilistic view. Based on the evaluation of a real boat data-driven simulation, the learned stability model contributed to superior driving skills in different USV scenarios by properly indicating and avoiding potentially risky states. In future research, we plan to expand the definition of risks in different tasks, such as loss of control, overlarge sway, and excessive energy consumption and investigate the proposed approach in real-world USV.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8898-8911"},"PeriodicalIF":6.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10750444/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This article addresses the robustness of unmanned surface vehicles (USV) using model-based reinforcement learning (MBRL). A novel MBRL approach, Lyapunov probabilistic model predictive control (LPMPC) is proposed to simultaneously learn both the probabilistic model of a USV and its corresponding estimated Lyapunov region of attraction (ROA) under one reinforcement learning framework. Unlike the existing MBRL USV systems with less consideration of robustness and safety, our method naturally learns a general indicator of system stability based on the probabilistic model’s belief and employs it to guide its policy. Evaluated by different navigation tasks in a simulation driven by real boat data, LPMPC demonstrated significant advantages in both control robustness and task completion against various levels of environmental disturbances compared with the baseline approach without Lyapunov ROA’s guidance. Note to Practitioners—Modelling the system stability without human prior knowledge is challenging in the domain of USV. This work proposed a data-driven method to iteratively learn a task-relevant stability model of USV in a probabilistic view. Based on the evaluation of a real boat data-driven simulation, the learned stability model contributed to superior driving skills in different USV scenarios by properly indicating and avoiding potentially risky states. In future research, we plan to expand the definition of risks in different tasks, such as loss of control, overlarge sway, and excessive energy consumption and investigate the proposed approach in real-world USV.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
估计基于模型的鲁棒强化学习 USV 的 Lyapunov 吸引区域
本文利用基于模型的强化学习(MBRL)解决了无人水面车辆(USV)的鲁棒性问题。提出了一种新的MBRL方法——李雅普诺夫概率模型预测控制(Lyapunov probabilistic model predictive control, LPMPC),该方法在一个强化学习框架下同时学习USV的概率模型及其相应的估计Lyapunov吸引区(ROA)。与现有的对鲁棒性和安全性考虑较少的MBRL USV系统不同,我们的方法基于概率模型的信念自然地学习了系统稳定性的一般指标,并用它来指导其策略。在真实船舶数据驱动的模拟中,通过对不同导航任务的评估,与没有Lyapunov ROA指导的基线方法相比,LPMPC在控制鲁棒性和任务完成度方面都具有显著优势。从业人员注意:在无人驾驶领域,没有人类先验知识的系统稳定性建模是具有挑战性的。本文提出了一种数据驱动的方法,在概率视图下迭代学习USV的任务相关稳定性模型。基于真实船只数据驱动仿真的评估,学习稳定性模型通过正确指示和避免潜在的危险状态,有助于提高不同USV场景下的驾驶技能。在未来的研究中,我们计划在不同的任务中扩展风险的定义,如失控、过大的摇摆和过度的能量消耗,并在现实世界的USV中研究所提出的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automation Science and Engineering
IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统
CiteScore
12.50
自引率
14.30%
发文量
404
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.
期刊最新文献
Corrections to “Dynamic Trajectory Planning for a Group of Unmanned Aerial Vehicles in Unknown Environments” Computational Resource Management of Edge Clouds for Vehicle-to-Network Services with Resource Limit Sliding Flexible Performance Preset Boundary-Based Fuzzy Control for Input Saturated Discrete-Time Nonlinear Systems Dual-layer Multi-objective Particle Swarm Optimization Algorithm for Partial Destructive Incomplete Disassembly Line Balancing Problem Vehicle stability and synchronization control of dual-motor steer-by-wire system considering time-varying CAN network time delay
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1