Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning

IF 3.9 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Control Systems Technology Pub Date : 2024-03-27 DOI:10.1109/TCST.2024.3377876

Yexin Fan;Hongyang Dong;Xiaowei Zhao;Petr Denissenko

{"title":"Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning","authors":"Yexin Fan;Hongyang Dong;Xiaowei Zhao;Petr Denissenko","doi":"10.1109/TCST.2024.3377876","DOIUrl":null,"url":null,"abstract":"This work proposes an innovative path-following control method, anchored in deep reinforcement learning (DRL), for unmanned underwater vehicles (UUVs). This approach is driven by several new designs, all of which aim to enhance learning efficiency and effectiveness and achieve high-performance UUV control. Specifically, a novel experience replay strategy is designed and integrated within the twin-delayed deep deterministic policy gradient algorithm (TD3). It distinguishes the significance of stored transitions by making a trade-off between rewards and temporal-difference (TD) errors, thus enabling the UUV agent to explore optimal control policies more efficiently. Another major challenge within this control problem arises from action oscillations associated with DRL policies. This issue leads to excessive system wear on actuators and makes real-time application difficult. To mitigate this challenge, a newly improved regularization method is proposed, which provides a moderate level of smoothness to the control policy. Furthermore, a dynamic reward function featuring adaptive constraints is designed to avoid unproductive exploration and expedite learning convergence speed further. Simulation results show that our method garners higher rewards in fewer training episodes compared with mainstream DRL-based control approaches (e.g., deep deterministic policy gradient (DDPG) and vanilla TD3) in UUV applications. Moreover, it can adapt to varying path configurations amid uncertainties and disturbances, all while ensuring high tracking accuracy. Simulation and experimental studies are conducted to verify the effectiveness.","PeriodicalId":13103,"journal":{"name":"IEEE Transactions on Control Systems Technology","volume":"32 5","pages":"1904-1919"},"PeriodicalIF":3.9000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Control Systems Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10480708/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This work proposes an innovative path-following control method, anchored in deep reinforcement learning (DRL), for unmanned underwater vehicles (UUVs). This approach is driven by several new designs, all of which aim to enhance learning efficiency and effectiveness and achieve high-performance UUV control. Specifically, a novel experience replay strategy is designed and integrated within the twin-delayed deep deterministic policy gradient algorithm (TD3). It distinguishes the significance of stored transitions by making a trade-off between rewards and temporal-difference (TD) errors, thus enabling the UUV agent to explore optimal control policies more efficiently. Another major challenge within this control problem arises from action oscillations associated with DRL policies. This issue leads to excessive system wear on actuators and makes real-time application difficult. To mitigate this challenge, a newly improved regularization method is proposed, which provides a moderate level of smoothness to the control policy. Furthermore, a dynamic reward function featuring adaptive constraints is designed to avoid unproductive exploration and expedite learning convergence speed further. Simulation results show that our method garners higher rewards in fewer training episodes compared with mainstream DRL-based control approaches (e.g., deep deterministic policy gradient (DDPG) and vanilla TD3) in UUV applications. Moreover, it can adapt to varying path configurations amid uncertainties and disturbances, all while ensuring high tracking accuracy. Simulation and experimental studies are conducted to verify the effectiveness.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于改进型 TD3 深度强化学习的无人潜航器路径跟踪控制

本研究针对无人潜航器（UUV）提出了一种基于深度强化学习（DRL）的创新路径跟踪控制方法。该方法由几项新设计驱动，所有这些设计都旨在提高学习效率和效果，实现高性能的无人潜航器控制。具体来说，设计了一种新颖的经验重放策略，并将其集成到双延迟深度确定性策略梯度算法（TD3）中。它通过在奖励和时差（TD）误差之间进行权衡来区分存储转换的重要性，从而使 UUV 代理能够更有效地探索最优控制策略。该控制问题的另一个主要挑战来自与 DRL 策略相关的动作振荡。这个问题会导致执行器系统过度磨损，给实时应用带来困难。为了缓解这一挑战，我们提出了一种新改进的正则化方法，它能为控制策略提供适度的平滑性。此外，还设计了具有自适应约束的动态奖励函数，以避免无益的探索，进一步加快学习收敛速度。仿真结果表明，与 UUV 应用中基于 DRL 的主流控制方法（如深度确定性策略梯度（DDPG）和 vanilla TD3）相比，我们的方法能在更短的训练时间内获得更高的奖励。此外，它还能适应不确定性和干扰下的不同路径配置，同时确保高跟踪精度。为验证其有效性，我们进行了仿真和实验研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Control Systems Technology 工程技术-工程：电子与电气

CiteScore

10.70

自引率

2.10%

发文量

218

审稿时长

6.7 months

期刊介绍： The IEEE Transactions on Control Systems Technology publishes high quality technical papers on technological advances in control engineering. The word technology is from the Greek technologia. The modern meaning is a scientific method to achieve a practical purpose. Control Systems Technology includes all aspects of control engineering needed to implement practical control systems, from analysis and design, through simulation and hardware. A primary purpose of the IEEE Transactions on Control Systems Technology is to have an archival publication which will bridge the gap between theory and practice. Papers are published in the IEEE Transactions on Control System Technology which disclose significant new knowledge, exploratory developments, or practical applications in all aspects of technology needed to implement control systems, from analysis and design through simulation, and hardware.