Jun Nie , Guihua Zhang , Xiao Lu , Haixia Wang , Chunyang Sheng , Lijie Sun
{"title":"基于样本正则化和自适应学习率的强化学习法,用于 AGV 路径规划","authors":"Jun Nie , Guihua Zhang , Xiao Lu , Haixia Wang , Chunyang Sheng , Lijie Sun","doi":"10.1016/j.neucom.2024.128820","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes the proximal policy optimization (PPO) method based on sample regularization (SR) and adaptive learning rate (ALR) to address the issues of limited exploration ability and slow convergence speed in Autonomous Guided Vehicle (AGV) path planning using reinforcement learning algorithms in dynamic environments. Firstly, the regularization term based on empirical samples is designed to solve the bias and imbalance issues of training samples, and the sample regularization is added to the objective function to improve the policy selectivity of the PPO algorithm, thereby increasing the AGV’s exploration ability during the training process in the working environment. Secondly, the Fisher information matrix of the Kullback-Leibler (KL) divergence approximation and the KL divergence constraint term are exploited to design the policy update mechanism based on the dynamically adjustable adaptive learning rate throughout training. The method considers the geometric structure of the parameter space and the change of the policy gradient, aiming to optimize parameter update direction and enhance convergence speed and stability of the algorithm. Finally, the AGV path planning scheme based on reinforcement learning is established for simulation verification and comparations in two-dimensional raster map and Gazebo 3D simulation environment. Simulation results verify the feasibility and superiority of the proposed method applied to the AGV path planning problem.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128820"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning\",\"authors\":\"Jun Nie , Guihua Zhang , Xiao Lu , Haixia Wang , Chunyang Sheng , Lijie Sun\",\"doi\":\"10.1016/j.neucom.2024.128820\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper proposes the proximal policy optimization (PPO) method based on sample regularization (SR) and adaptive learning rate (ALR) to address the issues of limited exploration ability and slow convergence speed in Autonomous Guided Vehicle (AGV) path planning using reinforcement learning algorithms in dynamic environments. Firstly, the regularization term based on empirical samples is designed to solve the bias and imbalance issues of training samples, and the sample regularization is added to the objective function to improve the policy selectivity of the PPO algorithm, thereby increasing the AGV’s exploration ability during the training process in the working environment. Secondly, the Fisher information matrix of the Kullback-Leibler (KL) divergence approximation and the KL divergence constraint term are exploited to design the policy update mechanism based on the dynamically adjustable adaptive learning rate throughout training. The method considers the geometric structure of the parameter space and the change of the policy gradient, aiming to optimize parameter update direction and enhance convergence speed and stability of the algorithm. Finally, the AGV path planning scheme based on reinforcement learning is established for simulation verification and comparations in two-dimensional raster map and Gazebo 3D simulation environment. Simulation results verify the feasibility and superiority of the proposed method applied to the AGV path planning problem.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"614 \",\"pages\":\"Article 128820\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224015911\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015911","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Reinforcement learning method based on sample regularization and adaptive learning rate for AGV path planning
This paper proposes the proximal policy optimization (PPO) method based on sample regularization (SR) and adaptive learning rate (ALR) to address the issues of limited exploration ability and slow convergence speed in Autonomous Guided Vehicle (AGV) path planning using reinforcement learning algorithms in dynamic environments. Firstly, the regularization term based on empirical samples is designed to solve the bias and imbalance issues of training samples, and the sample regularization is added to the objective function to improve the policy selectivity of the PPO algorithm, thereby increasing the AGV’s exploration ability during the training process in the working environment. Secondly, the Fisher information matrix of the Kullback-Leibler (KL) divergence approximation and the KL divergence constraint term are exploited to design the policy update mechanism based on the dynamically adjustable adaptive learning rate throughout training. The method considers the geometric structure of the parameter space and the change of the policy gradient, aiming to optimize parameter update direction and enhance convergence speed and stability of the algorithm. Finally, the AGV path planning scheme based on reinforcement learning is established for simulation verification and comparations in two-dimensional raster map and Gazebo 3D simulation environment. Simulation results verify the feasibility and superiority of the proposed method applied to the AGV path planning problem.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.