A decision-making of autonomous driving method based on DDPG with pretraining

Jinlin Ma, Mingyu Zhang, Kaiping Ma, Houzhong Zhang, Guoqing Geng
{"title":"A decision-making of autonomous driving method based on DDPG with pretraining","authors":"Jinlin Ma, Mingyu Zhang, Kaiping Ma, Houzhong Zhang, Guoqing Geng","doi":"10.1177/09544070241227303","DOIUrl":null,"url":null,"abstract":"Present the DDPGwP (DDPG with Pretraining) model, grounded in the framework of deep reinforcement learning, designed for autonomous driving decision-making. The model incorporates imitation learning by utilizing expert experience for supervised learning during initial training and weight preservation. A novel loss function is devised, enabling the expert experience to jointly guide the Actor network’s update alongside the Critic network while also participating in the Critic network’s updates. This approach allows imitation learning to dominate the early stages of training, with reinforcement learning taking the lead in later stages. Employing experience replay buffer separation techniques, we categorize and store collected superior, ordinary, and expert experiences. We select sensor inputs from the TORCS (The Open Racing Car Simulator) simulation platform and conduct experimental validation, comparing the results with the original DDPG, A2C, and PPO algorithms. Experimental outcomes reveal that incorporating imitation learning significantly accelerates early-stage training, reduces blind trial-and-error during initial exploration, and enhances algorithm stability and safety. The experience replay buffer separation technique improves sampling efficiency and mitigates algorithm overfitting. In addition to expediting algorithm training rates, our approach enables the simulated vehicle to learn superior strategies, garnering higher reward values. This demonstrates the superior stability, safety, and policy-making capabilities of the proposed algorithm, as well as accelerated network convergence.","PeriodicalId":509770,"journal":{"name":"Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/09544070241227303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Present the DDPGwP (DDPG with Pretraining) model, grounded in the framework of deep reinforcement learning, designed for autonomous driving decision-making. The model incorporates imitation learning by utilizing expert experience for supervised learning during initial training and weight preservation. A novel loss function is devised, enabling the expert experience to jointly guide the Actor network’s update alongside the Critic network while also participating in the Critic network’s updates. This approach allows imitation learning to dominate the early stages of training, with reinforcement learning taking the lead in later stages. Employing experience replay buffer separation techniques, we categorize and store collected superior, ordinary, and expert experiences. We select sensor inputs from the TORCS (The Open Racing Car Simulator) simulation platform and conduct experimental validation, comparing the results with the original DDPG, A2C, and PPO algorithms. Experimental outcomes reveal that incorporating imitation learning significantly accelerates early-stage training, reduces blind trial-and-error during initial exploration, and enhances algorithm stability and safety. The experience replay buffer separation technique improves sampling efficiency and mitigates algorithm overfitting. In addition to expediting algorithm training rates, our approach enables the simulated vehicle to learn superior strategies, garnering higher reward values. This demonstrates the superior stability, safety, and policy-making capabilities of the proposed algorithm, as well as accelerated network convergence.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于预训练 DDPG 的自动驾驶决策方法
介绍基于深度强化学习框架的 DDPGwP(带预训练的 DDPG)模型,该模型专为自动驾驶决策而设计。该模型在初始训练和权重保持过程中利用专家经验进行监督学习,从而将模仿学习融入其中。该模型设计了一个新颖的损失函数,使专家经验能够与批判网络一起共同指导行动者网络的更新,同时也参与批判网络的更新。这种方法允许模仿学习在训练的早期阶段占据主导地位,而强化学习则在后期阶段占据主导地位。利用经验重放缓冲区分离技术,我们对收集到的高级经验、普通经验和专家经验进行分类和存储。我们从 TORCS(开放式赛车模拟器)模拟平台上选择传感器输入,并进行实验验证,将结果与原始的 DDPG、A2C 和 PPO 算法进行比较。实验结果表明,模仿学习大大加快了早期阶段的训练速度,减少了初始探索过程中的盲目试错,提高了算法的稳定性和安全性。经验重放缓冲区分离技术提高了采样效率,减轻了算法的过度拟合。除了加快算法训练速度外,我们的方法还能让模拟车辆学习到更优越的策略,获得更高的奖励值。这证明了所提出算法的卓越稳定性、安全性和决策能力,以及加速网络收敛的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Influence of filler-reinforced carbon fibers on the frictional properties of composite synchronizer rings Long-short-time domain torque optimal prediction and allocation method for electric logistics vehicles with electro-hydraulic composite steering system Autonomous vehicle platoon overtaking at a uniform speed based on improved artificial potential field method Prediction of emission and performance of internal combustion engine via regression deep learning approach Influence of surface activated nanophase Pr6O11 particles on the physio-chemical and tribological characteristics of SAE20W40 automotive lubricant
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1