{"title":"基于深度强化学习方法的多静态障碍物环境下的USV避障","authors":"Dengyao Jiang, Mingzhe Yuan, Junfeng Xiong, Jinchao Xiao, Yong Duan","doi":"10.1177/00202940231195937","DOIUrl":null,"url":null,"abstract":"Unmanned surface vehicles (USVs) are intelligent platforms for unmanned surface navigation based on artificial intelligence, motion control, environmental awareness, and other professional technologies. Obstacle avoidance is an important part of its autonomous navigation. Although the USV works in the water environment (e.g. monitoring and tracking, search and rescue scenarios), the dynamic and complex operating environment makes the traditional methods not suitable for solving the obstacle avoidance problem of the USV. In this paper, to address the issue of poor convergence of the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm of Deep Reinforcement Learning (DRL) in an unstructured environment and wave current interference, random walk policy is proposed to deposit the pre-exploration policy of the algorithm into the experience pool to accelerate the convergence of the algorithm and thus achieve USV obstacle avoidance, which can achieve collision-free navigation from any start point to a given end point in a dynamic and complex environment without offline trajectory and track point generation. We design a pre-exploration policy for the environment and a virtual simulation environment for training and testing the algorithm and give the reward function and training method. The simulation results show that our proposed algorithm is more manageable to converge than the original algorithm and can perform better in complex environments in terms of obstacle avoidance behavior, reflecting the algorithm’s feasibility and effectiveness.","PeriodicalId":49849,"journal":{"name":"Measurement & Control","volume":"26 1","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach\",\"authors\":\"Dengyao Jiang, Mingzhe Yuan, Junfeng Xiong, Jinchao Xiao, Yong Duan\",\"doi\":\"10.1177/00202940231195937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned surface vehicles (USVs) are intelligent platforms for unmanned surface navigation based on artificial intelligence, motion control, environmental awareness, and other professional technologies. Obstacle avoidance is an important part of its autonomous navigation. Although the USV works in the water environment (e.g. monitoring and tracking, search and rescue scenarios), the dynamic and complex operating environment makes the traditional methods not suitable for solving the obstacle avoidance problem of the USV. In this paper, to address the issue of poor convergence of the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm of Deep Reinforcement Learning (DRL) in an unstructured environment and wave current interference, random walk policy is proposed to deposit the pre-exploration policy of the algorithm into the experience pool to accelerate the convergence of the algorithm and thus achieve USV obstacle avoidance, which can achieve collision-free navigation from any start point to a given end point in a dynamic and complex environment without offline trajectory and track point generation. We design a pre-exploration policy for the environment and a virtual simulation environment for training and testing the algorithm and give the reward function and training method. The simulation results show that our proposed algorithm is more manageable to converge than the original algorithm and can perform better in complex environments in terms of obstacle avoidance behavior, reflecting the algorithm’s feasibility and effectiveness.\",\"PeriodicalId\":49849,\"journal\":{\"name\":\"Measurement & Control\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement & Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/00202940231195937\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement & Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/00202940231195937","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
无人水面车辆(usv)是基于人工智能、运动控制、环境感知等专业技术的无人水面导航智能平台。避障是其自主导航的重要组成部分。虽然无人潜航器工作在水中环境(如监测跟踪、搜救等场景),但其运行环境的动态性和复杂性使得传统方法无法解决无人潜航器的避障问题。本文针对深度强化学习(Deep Reinforcement Learning, DRL)的双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic policy gradient, TD3)算法在非结构化环境下收敛性差以及波浪电流干扰的问题,提出随机行走策略,将算法的预探索策略存入经验池中,加速算法收敛,从而实现USV避障。在动态复杂环境下,无需离线轨迹和轨迹点生成,即可实现从任意起点到给定终点的无碰撞导航。设计了环境预探索策略和算法训练测试的虚拟仿真环境,并给出了奖励函数和训练方法。仿真结果表明,本文提出的算法比原算法更易于收敛,在复杂环境下避障行为表现更好,体现了算法的可行性和有效性。
Obstacle avoidance USV in multi-static obstacle environments based on a deep reinforcement learning approach
Unmanned surface vehicles (USVs) are intelligent platforms for unmanned surface navigation based on artificial intelligence, motion control, environmental awareness, and other professional technologies. Obstacle avoidance is an important part of its autonomous navigation. Although the USV works in the water environment (e.g. monitoring and tracking, search and rescue scenarios), the dynamic and complex operating environment makes the traditional methods not suitable for solving the obstacle avoidance problem of the USV. In this paper, to address the issue of poor convergence of the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm of Deep Reinforcement Learning (DRL) in an unstructured environment and wave current interference, random walk policy is proposed to deposit the pre-exploration policy of the algorithm into the experience pool to accelerate the convergence of the algorithm and thus achieve USV obstacle avoidance, which can achieve collision-free navigation from any start point to a given end point in a dynamic and complex environment without offline trajectory and track point generation. We design a pre-exploration policy for the environment and a virtual simulation environment for training and testing the algorithm and give the reward function and training method. The simulation results show that our proposed algorithm is more manageable to converge than the original algorithm and can perform better in complex environments in terms of obstacle avoidance behavior, reflecting the algorithm’s feasibility and effectiveness.
期刊介绍:
Measurement and Control publishes peer-reviewed practical and technical research and news pieces from both the science and engineering industry and academia. Whilst focusing more broadly on topics of relevance for practitioners in instrumentation and control, the journal also includes updates on both product and business announcements and information on technical advances.