{"title":"基于改进型 TD3 深度强化学习的无人潜航器路径跟踪控制","authors":"Yexin Fan;Hongyang Dong;Xiaowei Zhao;Petr Denissenko","doi":"10.1109/TCST.2024.3377876","DOIUrl":null,"url":null,"abstract":"This work proposes an innovative path-following control method, anchored in deep reinforcement learning (DRL), for unmanned underwater vehicles (UUVs). This approach is driven by several new designs, all of which aim to enhance learning efficiency and effectiveness and achieve high-performance UUV control. Specifically, a novel experience replay strategy is designed and integrated within the twin-delayed deep deterministic policy gradient algorithm (TD3). It distinguishes the significance of stored transitions by making a trade-off between rewards and temporal-difference (TD) errors, thus enabling the UUV agent to explore optimal control policies more efficiently. Another major challenge within this control problem arises from action oscillations associated with DRL policies. This issue leads to excessive system wear on actuators and makes real-time application difficult. To mitigate this challenge, a newly improved regularization method is proposed, which provides a moderate level of smoothness to the control policy. Furthermore, a dynamic reward function featuring adaptive constraints is designed to avoid unproductive exploration and expedite learning convergence speed further. Simulation results show that our method garners higher rewards in fewer training episodes compared with mainstream DRL-based control approaches (e.g., deep deterministic policy gradient (DDPG) and vanilla TD3) in UUV applications. Moreover, it can adapt to varying path configurations amid uncertainties and disturbances, all while ensuring high tracking accuracy. Simulation and experimental studies are conducted to verify the effectiveness.","PeriodicalId":13103,"journal":{"name":"IEEE Transactions on Control Systems Technology","volume":"32 5","pages":"1904-1919"},"PeriodicalIF":4.9000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning\",\"authors\":\"Yexin Fan;Hongyang Dong;Xiaowei Zhao;Petr Denissenko\",\"doi\":\"10.1109/TCST.2024.3377876\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work proposes an innovative path-following control method, anchored in deep reinforcement learning (DRL), for unmanned underwater vehicles (UUVs). This approach is driven by several new designs, all of which aim to enhance learning efficiency and effectiveness and achieve high-performance UUV control. Specifically, a novel experience replay strategy is designed and integrated within the twin-delayed deep deterministic policy gradient algorithm (TD3). It distinguishes the significance of stored transitions by making a trade-off between rewards and temporal-difference (TD) errors, thus enabling the UUV agent to explore optimal control policies more efficiently. Another major challenge within this control problem arises from action oscillations associated with DRL policies. This issue leads to excessive system wear on actuators and makes real-time application difficult. To mitigate this challenge, a newly improved regularization method is proposed, which provides a moderate level of smoothness to the control policy. Furthermore, a dynamic reward function featuring adaptive constraints is designed to avoid unproductive exploration and expedite learning convergence speed further. Simulation results show that our method garners higher rewards in fewer training episodes compared with mainstream DRL-based control approaches (e.g., deep deterministic policy gradient (DDPG) and vanilla TD3) in UUV applications. Moreover, it can adapt to varying path configurations amid uncertainties and disturbances, all while ensuring high tracking accuracy. Simulation and experimental studies are conducted to verify the effectiveness.\",\"PeriodicalId\":13103,\"journal\":{\"name\":\"IEEE Transactions on Control Systems Technology\",\"volume\":\"32 5\",\"pages\":\"1904-1919\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2024-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Control Systems Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10480708/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Control Systems Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10480708/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning
This work proposes an innovative path-following control method, anchored in deep reinforcement learning (DRL), for unmanned underwater vehicles (UUVs). This approach is driven by several new designs, all of which aim to enhance learning efficiency and effectiveness and achieve high-performance UUV control. Specifically, a novel experience replay strategy is designed and integrated within the twin-delayed deep deterministic policy gradient algorithm (TD3). It distinguishes the significance of stored transitions by making a trade-off between rewards and temporal-difference (TD) errors, thus enabling the UUV agent to explore optimal control policies more efficiently. Another major challenge within this control problem arises from action oscillations associated with DRL policies. This issue leads to excessive system wear on actuators and makes real-time application difficult. To mitigate this challenge, a newly improved regularization method is proposed, which provides a moderate level of smoothness to the control policy. Furthermore, a dynamic reward function featuring adaptive constraints is designed to avoid unproductive exploration and expedite learning convergence speed further. Simulation results show that our method garners higher rewards in fewer training episodes compared with mainstream DRL-based control approaches (e.g., deep deterministic policy gradient (DDPG) and vanilla TD3) in UUV applications. Moreover, it can adapt to varying path configurations amid uncertainties and disturbances, all while ensuring high tracking accuracy. Simulation and experimental studies are conducted to verify the effectiveness.
期刊介绍:
The IEEE Transactions on Control Systems Technology publishes high quality technical papers on technological advances in control engineering. The word technology is from the Greek technologia. The modern meaning is a scientific method to achieve a practical purpose. Control Systems Technology includes all aspects of control engineering needed to implement practical control systems, from analysis and design, through simulation and hardware. A primary purpose of the IEEE Transactions on Control Systems Technology is to have an archival publication which will bridge the gap between theory and practice. Papers are published in the IEEE Transactions on Control System Technology which disclose significant new knowledge, exploratory developments, or practical applications in all aspects of technology needed to implement control systems, from analysis and design through simulation, and hardware.