{"title":"Route optimization for autonomous bulldozer by distributed deep reinforcement learning","authors":"Yasuhiro Osaka, Naoya Odajima, Y. Uchimura","doi":"10.1109/ICM46511.2021.9385686","DOIUrl":null,"url":null,"abstract":"Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.","PeriodicalId":373423,"journal":{"name":"2021 IEEE International Conference on Mechatronics (ICM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Mechatronics (ICM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICM46511.2021.9385686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Since the publication showed DQN based reinforcement learning methods exceeds human's score in Atari 2600 video games, various deep reinforcement learning have bee researched. This paper proposes a method to control bulldozer autonomously by learning the sediment leveling route using PPO that enables distributed deep reinforcement learning. The simulator was originally developed that enables to reproduce the behavior of small and uniform sediment. By incorporating an LSTM that processes the input state as time-series data into the agent network, more than 95% of the sediment in the target area on average was achieved. In addition, the generalization performance for unknown condition was evaluated, by giving unlearned conditions were given as initial setups.