A decision-making of autonomous driving method based on DDPG with pretraining

Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering Pub Date : 2024-01-29 DOI:10.1177/09544070241227303

Jinlin Ma, Mingyu Zhang, Kaiping Ma, Houzhong Zhang, Guoqing Geng

{"title":"A decision-making of autonomous driving method based on DDPG with pretraining","authors":"Jinlin Ma, Mingyu Zhang, Kaiping Ma, Houzhong Zhang, Guoqing Geng","doi":"10.1177/09544070241227303","DOIUrl":null,"url":null,"abstract":"Present the DDPGwP (DDPG with Pretraining) model, grounded in the framework of deep reinforcement learning, designed for autonomous driving decision-making. The model incorporates imitation learning by utilizing expert experience for supervised learning during initial training and weight preservation. A novel loss function is devised, enabling the expert experience to jointly guide the Actor network’s update alongside the Critic network while also participating in the Critic network’s updates. This approach allows imitation learning to dominate the early stages of training, with reinforcement learning taking the lead in later stages. Employing experience replay buffer separation techniques, we categorize and store collected superior, ordinary, and expert experiences. We select sensor inputs from the TORCS (The Open Racing Car Simulator) simulation platform and conduct experimental validation, comparing the results with the original DDPG, A2C, and PPO algorithms. Experimental outcomes reveal that incorporating imitation learning significantly accelerates early-stage training, reduces blind trial-and-error during initial exploration, and enhances algorithm stability and safety. The experience replay buffer separation technique improves sampling efficiency and mitigates algorithm overfitting. In addition to expediting algorithm training rates, our approach enables the simulated vehicle to learn superior strategies, garnering higher reward values. This demonstrates the superior stability, safety, and policy-making capabilities of the proposed algorithm, as well as accelerated network convergence.","PeriodicalId":509770,"journal":{"name":"Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/09544070241227303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Present the DDPGwP (DDPG with Pretraining) model, grounded in the framework of deep reinforcement learning, designed for autonomous driving decision-making. The model incorporates imitation learning by utilizing expert experience for supervised learning during initial training and weight preservation. A novel loss function is devised, enabling the expert experience to jointly guide the Actor network’s update alongside the Critic network while also participating in the Critic network’s updates. This approach allows imitation learning to dominate the early stages of training, with reinforcement learning taking the lead in later stages. Employing experience replay buffer separation techniques, we categorize and store collected superior, ordinary, and expert experiences. We select sensor inputs from the TORCS (The Open Racing Car Simulator) simulation platform and conduct experimental validation, comparing the results with the original DDPG, A2C, and PPO algorithms. Experimental outcomes reveal that incorporating imitation learning significantly accelerates early-stage training, reduces blind trial-and-error during initial exploration, and enhances algorithm stability and safety. The experience replay buffer separation technique improves sampling efficiency and mitigates algorithm overfitting. In addition to expediting algorithm training rates, our approach enables the simulated vehicle to learn superior strategies, garnering higher reward values. This demonstrates the superior stability, safety, and policy-making capabilities of the proposed algorithm, as well as accelerated network convergence.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于预训练 DDPG 的自动驾驶决策方法

介绍基于深度强化学习框架的 DDPGwP（带预训练的 DDPG）模型，该模型专为自动驾驶决策而设计。该模型在初始训练和权重保持过程中利用专家经验进行监督学习，从而将模仿学习融入其中。该模型设计了一个新颖的损失函数，使专家经验能够与批判网络一起共同指导行动者网络的更新，同时也参与批判网络的更新。这种方法允许模仿学习在训练的早期阶段占据主导地位，而强化学习则在后期阶段占据主导地位。利用经验重放缓冲区分离技术，我们对收集到的高级经验、普通经验和专家经验进行分类和存储。我们从 TORCS（开放式赛车模拟器）模拟平台上选择传感器输入，并进行实验验证，将结果与原始的 DDPG、A2C 和 PPO 算法进行比较。实验结果表明，模仿学习大大加快了早期阶段的训练速度，减少了初始探索过程中的盲目试错，提高了算法的稳定性和安全性。经验重放缓冲区分离技术提高了采样效率，减轻了算法的过度拟合。除了加快算法训练速度外，我们的方法还能让模拟车辆学习到更优越的策略，获得更高的奖励值。这证明了所提出算法的卓越稳定性、安全性和决策能力，以及加速网络收敛的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering

自引率

0.00%

发文量