Path Planning for Autonomous Vehicles in Unknown Dynamic Environment Based on Deep Reinforcement Learning

IF 2.5 4区综合性期刊 Q2 CHEMISTRY, MULTIDISCIPLINARY Applied Sciences-Basel Pub Date : 2023-09-06 DOI:10.3390/app131810056

Hui Hu, Yuge Wang, Wenjie Tong, Jiao Zhao, Yulei Gu

{"title":"Path Planning for Autonomous Vehicles in Unknown Dynamic Environment Based on Deep Reinforcement Learning","authors":"Hui Hu, Yuge Wang, Wenjie Tong, Jiao Zhao, Yulei Gu","doi":"10.3390/app131810056","DOIUrl":null,"url":null,"abstract":"Autonomous vehicles can reduce labor power during cargo transportation, and then improve transportation efficiency, for example, the automated guided vehicle (AGV) in the warehouse can improve the operation efficiency. To overcome the limitations of traditional path planning algorithms in unknown environments, such as reliance on high-precision maps, lack of generalization ability, and obstacle avoidance capability, this study focuses on investigating the Deep Q-Network and its derivative algorithm to enhance network and algorithm structures. A new algorithm named APF-D3QNPER is proposed, which combines the action output method of artificial potential field (APF) with the Dueling Double Deep Q Network algorithm, and experience sample rewards are considered in the experience playback portion of the traditional Deep Reinforcement Learning (DRL) algorithm, which enhances the convergence ability of the traditional DRL algorithm. A long short-term memory (LSTM) network is added to the state feature extraction network part to improve its adaptability in unknown environments and enhance its spatiotemporal sensitivity to the environment. The APF-D3QNPER algorithm is compared with mainstream deep reinforcement learning algorithms and traditional path planning algorithms using a robot operating system and the Gazebo simulation platform by conducting experiments. The results demonstrate that the APF-D3QNPER algorithm exhibits excellent generalization abilities in the simulation environment, and the convergence speed, the loss value, the path planning time, and the path planning length of the APF-D3QNPER algorithm are all less than for other algorithms in diverse scenarios.","PeriodicalId":48760,"journal":{"name":"Applied Sciences-Basel","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences-Basel","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/app131810056","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Autonomous vehicles can reduce labor power during cargo transportation, and then improve transportation efficiency, for example, the automated guided vehicle (AGV) in the warehouse can improve the operation efficiency. To overcome the limitations of traditional path planning algorithms in unknown environments, such as reliance on high-precision maps, lack of generalization ability, and obstacle avoidance capability, this study focuses on investigating the Deep Q-Network and its derivative algorithm to enhance network and algorithm structures. A new algorithm named APF-D3QNPER is proposed, which combines the action output method of artificial potential field (APF) with the Dueling Double Deep Q Network algorithm, and experience sample rewards are considered in the experience playback portion of the traditional Deep Reinforcement Learning (DRL) algorithm, which enhances the convergence ability of the traditional DRL algorithm. A long short-term memory (LSTM) network is added to the state feature extraction network part to improve its adaptability in unknown environments and enhance its spatiotemporal sensitivity to the environment. The APF-D3QNPER algorithm is compared with mainstream deep reinforcement learning algorithms and traditional path planning algorithms using a robot operating system and the Gazebo simulation platform by conducting experiments. The results demonstrate that the APF-D3QNPER algorithm exhibits excellent generalization abilities in the simulation environment, and the convergence speed, the loss value, the path planning time, and the path planning length of the APF-D3QNPER algorithm are all less than for other algorithms in diverse scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度强化学习的未知动态环境下自动驾驶汽车路径规划

自动驾驶汽车可以减少货物运输过程中的劳动力，从而提高运输效率，例如仓库中的自动导引车（AGV）可以提高运营效率。为了克服传统路径规划算法在未知环境中依赖高精度地图、缺乏泛化能力和避障能力等局限性，本研究重点研究了深度Q网络及其衍生算法，以增强网络和算法结构。提出了一种新的算法APF-D3QNPER，它将人工势场的动作输出方法与决斗双深度Q网络算法相结合，并在传统的深度强化学习算法的经验回放部分考虑了经验样本奖励，增强了传统DRL算法的收敛能力。在状态特征提取网络部分增加了长短期记忆（LSTM）网络，以提高其在未知环境中的适应性，增强其对环境的时空敏感性。利用机器人操作系统和Gazebo仿真平台，通过实验将APF-D3QNPER算法与主流的深度强化学习算法和传统的路径规划算法进行了比较。结果表明，APF-D3QNPER算法在仿真环境中表现出良好的泛化能力，在不同场景下，其收敛速度、损失值、路径规划时间和路径规划长度都小于其他算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Sciences-Basel CHEMISTRY, MULTIDISCIPLINARYMATERIALS SCIE-MATERIALS SCIENCE, MULTIDISCIPLINARY

CiteScore

5.30

自引率

11.10%

发文量

10882

期刊介绍： Applied Sciences (ISSN 2076-3417) provides an advanced forum on all aspects of applied natural sciences. It publishes reviews, research papers and communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files and software regarding the full details of the calculation or experimental procedure, if unable to be published in a normal way, can be deposited as supplementary electronic material.