{"title":"一种新的基于深度强化学习的无目标网络机器人路径规划算法","authors":"Yanan Cao, Dongbin Zhao, Xiang Cao","doi":"10.1109/ISAS59543.2023.10164323","DOIUrl":null,"url":null,"abstract":"Intelligent agent navigation has broad application scenarios, one of the hot research directions in this field is agent dynamic path planning. For the target network in deep reinforcement learning to gradually deviate from online reinforcement learning, a new softmax operator is applied to replace the max operator in the original network after deleting the target network. New prioritized experience replay is applied to enhance the experience utilization of the agent and dueling network is employed to improve the perceptions of the environment in the path planning process. A modified dynamic ϵ-greedy algorithm is then developed to select actions. The experimental results in the simulation environment show that even after deleting the target network, the algorithm in this paper can still converge to a higher value within limited episodes, which proves its effectiveness.","PeriodicalId":199115,"journal":{"name":"2023 6th International Symposium on Autonomous Systems (ISAS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A New Deep Reinforcement Learning Based Robot Path Planning Algorithm without Target Network\",\"authors\":\"Yanan Cao, Dongbin Zhao, Xiang Cao\",\"doi\":\"10.1109/ISAS59543.2023.10164323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intelligent agent navigation has broad application scenarios, one of the hot research directions in this field is agent dynamic path planning. For the target network in deep reinforcement learning to gradually deviate from online reinforcement learning, a new softmax operator is applied to replace the max operator in the original network after deleting the target network. New prioritized experience replay is applied to enhance the experience utilization of the agent and dueling network is employed to improve the perceptions of the environment in the path planning process. A modified dynamic ϵ-greedy algorithm is then developed to select actions. The experimental results in the simulation environment show that even after deleting the target network, the algorithm in this paper can still converge to a higher value within limited episodes, which proves its effectiveness.\",\"PeriodicalId\":199115,\"journal\":{\"name\":\"2023 6th International Symposium on Autonomous Systems (ISAS)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 6th International Symposium on Autonomous Systems (ISAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISAS59543.2023.10164323\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Symposium on Autonomous Systems (ISAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAS59543.2023.10164323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Deep Reinforcement Learning Based Robot Path Planning Algorithm without Target Network
Intelligent agent navigation has broad application scenarios, one of the hot research directions in this field is agent dynamic path planning. For the target network in deep reinforcement learning to gradually deviate from online reinforcement learning, a new softmax operator is applied to replace the max operator in the original network after deleting the target network. New prioritized experience replay is applied to enhance the experience utilization of the agent and dueling network is employed to improve the perceptions of the environment in the path planning process. A modified dynamic ϵ-greedy algorithm is then developed to select actions. The experimental results in the simulation environment show that even after deleting the target network, the algorithm in this paper can still converge to a higher value within limited episodes, which proves its effectiveness.