Path Planning for Autonomous Vehicles in Unknown Dynamic Environment Based on Deep Reinforcement Learning

IF 2.5 4区 综合性期刊 Q2 CHEMISTRY, MULTIDISCIPLINARY Applied Sciences-Basel Pub Date : 2023-09-06 DOI:10.3390/app131810056
Hui Hu, Yuge Wang, Wenjie Tong, Jiao Zhao, Yulei Gu
{"title":"Path Planning for Autonomous Vehicles in Unknown Dynamic Environment Based on Deep Reinforcement Learning","authors":"Hui Hu, Yuge Wang, Wenjie Tong, Jiao Zhao, Yulei Gu","doi":"10.3390/app131810056","DOIUrl":null,"url":null,"abstract":"Autonomous vehicles can reduce labor power during cargo transportation, and then improve transportation efficiency, for example, the automated guided vehicle (AGV) in the warehouse can improve the operation efficiency. To overcome the limitations of traditional path planning algorithms in unknown environments, such as reliance on high-precision maps, lack of generalization ability, and obstacle avoidance capability, this study focuses on investigating the Deep Q-Network and its derivative algorithm to enhance network and algorithm structures. A new algorithm named APF-D3QNPER is proposed, which combines the action output method of artificial potential field (APF) with the Dueling Double Deep Q Network algorithm, and experience sample rewards are considered in the experience playback portion of the traditional Deep Reinforcement Learning (DRL) algorithm, which enhances the convergence ability of the traditional DRL algorithm. A long short-term memory (LSTM) network is added to the state feature extraction network part to improve its adaptability in unknown environments and enhance its spatiotemporal sensitivity to the environment. The APF-D3QNPER algorithm is compared with mainstream deep reinforcement learning algorithms and traditional path planning algorithms using a robot operating system and the Gazebo simulation platform by conducting experiments. The results demonstrate that the APF-D3QNPER algorithm exhibits excellent generalization abilities in the simulation environment, and the convergence speed, the loss value, the path planning time, and the path planning length of the APF-D3QNPER algorithm are all less than for other algorithms in diverse scenarios.","PeriodicalId":48760,"journal":{"name":"Applied Sciences-Basel","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences-Basel","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/app131810056","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Autonomous vehicles can reduce labor power during cargo transportation, and then improve transportation efficiency, for example, the automated guided vehicle (AGV) in the warehouse can improve the operation efficiency. To overcome the limitations of traditional path planning algorithms in unknown environments, such as reliance on high-precision maps, lack of generalization ability, and obstacle avoidance capability, this study focuses on investigating the Deep Q-Network and its derivative algorithm to enhance network and algorithm structures. A new algorithm named APF-D3QNPER is proposed, which combines the action output method of artificial potential field (APF) with the Dueling Double Deep Q Network algorithm, and experience sample rewards are considered in the experience playback portion of the traditional Deep Reinforcement Learning (DRL) algorithm, which enhances the convergence ability of the traditional DRL algorithm. A long short-term memory (LSTM) network is added to the state feature extraction network part to improve its adaptability in unknown environments and enhance its spatiotemporal sensitivity to the environment. The APF-D3QNPER algorithm is compared with mainstream deep reinforcement learning algorithms and traditional path planning algorithms using a robot operating system and the Gazebo simulation platform by conducting experiments. The results demonstrate that the APF-D3QNPER algorithm exhibits excellent generalization abilities in the simulation environment, and the convergence speed, the loss value, the path planning time, and the path planning length of the APF-D3QNPER algorithm are all less than for other algorithms in diverse scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度强化学习的未知动态环境下自动驾驶汽车路径规划
自动驾驶汽车可以减少货物运输过程中的劳动力,从而提高运输效率,例如仓库中的自动导引车(AGV)可以提高运营效率。为了克服传统路径规划算法在未知环境中依赖高精度地图、缺乏泛化能力和避障能力等局限性,本研究重点研究了深度Q网络及其衍生算法,以增强网络和算法结构。提出了一种新的算法APF-D3QNPER,它将人工势场的动作输出方法与决斗双深度Q网络算法相结合,并在传统的深度强化学习算法的经验回放部分考虑了经验样本奖励,增强了传统DRL算法的收敛能力。在状态特征提取网络部分增加了长短期记忆(LSTM)网络,以提高其在未知环境中的适应性,增强其对环境的时空敏感性。利用机器人操作系统和Gazebo仿真平台,通过实验将APF-D3QNPER算法与主流的深度强化学习算法和传统的路径规划算法进行了比较。结果表明,APF-D3QNPER算法在仿真环境中表现出良好的泛化能力,在不同场景下,其收敛速度、损失值、路径规划时间和路径规划长度都小于其他算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Sciences-Basel
Applied Sciences-Basel CHEMISTRY, MULTIDISCIPLINARYMATERIALS SCIE-MATERIALS SCIENCE, MULTIDISCIPLINARY
CiteScore
5.30
自引率
11.10%
发文量
10882
期刊介绍: Applied Sciences (ISSN 2076-3417) provides an advanced forum on all aspects of applied natural sciences. It publishes reviews, research papers and communications. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files and software regarding the full details of the calculation or experimental procedure, if unable to be published in a normal way, can be deposited as supplementary electronic material.
期刊最新文献
Application of Digital Holographic Imaging to Monitor Real-Time Cardiomyocyte Hypertrophy Dynamics in Response to Norepinephrine Stimulation. Study on Shear Resistance and Structural Performance of Corrugated Steel–Concrete Composite Deck Clustering Analysis of Wind Turbine Alarm Sequences Based on Domain Knowledge-Fused Word2vec Unraveling Functional Dysphagia: A Game-Changing Automated Machine-Learning Diagnostic Approach Spatial Overlay Analysis of Geochemical Singularity Index α-Value of Porphyry Cu Deposit in Gangdese Metallogenic Belt, Tibet, Western China
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1