{"title":"Path planning of improved DQN based on quantile regression","authors":"Lun Zhou, Ke Wang, Hang Yu, Zhen Wang","doi":"10.1109/AICIT55386.2022.9930247","DOIUrl":null,"url":null,"abstract":"To solve the problems of slow convergence and overestimation of the value of quantile regression-deep reinforcement learning algorithm, a Dueling Double Depth Q algorithm based on quantile regression (QR-D3QN) was proposed. Based on QR-DQN, the calculation method of the target Q value is modified to reduce the influence of value overestimation. Combining the confrontation network and adding preferential experience sampling to improve the utilization efficiency of effective data. It is verified by the ROSGazebo simulation platform that the robot can effectively select actions, get a good strategy, and can quickly avoid obstacles and find the target point. Compared with D3QN, the route planned by the robot is shortened by 4.95%, and the obstacle avoidance path is reduced by 18.8%.","PeriodicalId":231070,"journal":{"name":"2022 International Conference on Artificial Intelligence and Computer Information Technology (AICIT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Artificial Intelligence and Computer Information Technology (AICIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICIT55386.2022.9930247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
To solve the problems of slow convergence and overestimation of the value of quantile regression-deep reinforcement learning algorithm, a Dueling Double Depth Q algorithm based on quantile regression (QR-D3QN) was proposed. Based on QR-DQN, the calculation method of the target Q value is modified to reduce the influence of value overestimation. Combining the confrontation network and adding preferential experience sampling to improve the utilization efficiency of effective data. It is verified by the ROSGazebo simulation platform that the robot can effectively select actions, get a good strategy, and can quickly avoid obstacles and find the target point. Compared with D3QN, the route planned by the robot is shortened by 4.95%, and the obstacle avoidance path is reduced by 18.8%.