RL SolVeR Pro: Reinforcement Learning for Solving Vehicle Routing Problem

2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS) Pub Date : 2019-09-01 DOI:10.1109/AiDAS47888.2019.8970890

Arun Kumar Kalakanti, Shivani Verma, T. Paul, Takufumi Yoshida

{"title":"RL SolVeR Pro: Reinforcement Learning for Solving Vehicle Routing Problem","authors":"Arun Kumar Kalakanti, Shivani Verma, T. Paul, Takufumi Yoshida","doi":"10.1109/AiDAS47888.2019.8970890","DOIUrl":null,"url":null,"abstract":"Vehicle Routing Problem (VRP) is a well-known NP-hard combinatorial optimization problem at the heart of the transportation and logistics research. VRP can be exactly solved only for small instances of the problem with conventional methods. Traditionally this problem has been solved using heuristic methods for large instances even though there is no guarantee of optimality. Efficient solution adopted to VRP may lead to significant savings per year in large transportation and logistics systems. Much of the recent works using Reinforcement Learning are computationally intensive and face the three curse of dimensionality: explosions in state and action spaces and high stochasticity i.e., large number of possible next states for a given state action pair. Also, recent works on VRP don’t consider the realistic simulation settings of customer environments, stochastic elements and scalability aspects as they use only standard Solomon benchmark instances of at most 100 customers. In this work, Reinforcement Learning Solver for Vehicle Routing Problem (RL SolVeR Pro) is proposed wherein the optimal route learning problem is cast as a Markov Decision Process (MDP). The curse of dimensionality of RL is also overcome by using two-phase solver with geometric clustering. Also, realistic simulation for VRP was used to validate the effectiveness and applicability of the proposed RL SolVeR Pro under various conditions and constraints. Our simulation results suggest that our proposed method is able to obtain better or same level of results, compared to the two best-known heuristics: Clarke-Wright Savings and Sweep Heuristic. The proposed RL Solver can be applied to other variants of the VRP and has the potential to be applied more generally to other combinatorial optimization problems.","PeriodicalId":227508,"journal":{"name":"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AiDAS47888.2019.8970890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Vehicle Routing Problem (VRP) is a well-known NP-hard combinatorial optimization problem at the heart of the transportation and logistics research. VRP can be exactly solved only for small instances of the problem with conventional methods. Traditionally this problem has been solved using heuristic methods for large instances even though there is no guarantee of optimality. Efficient solution adopted to VRP may lead to significant savings per year in large transportation and logistics systems. Much of the recent works using Reinforcement Learning are computationally intensive and face the three curse of dimensionality: explosions in state and action spaces and high stochasticity i.e., large number of possible next states for a given state action pair. Also, recent works on VRP don’t consider the realistic simulation settings of customer environments, stochastic elements and scalability aspects as they use only standard Solomon benchmark instances of at most 100 customers. In this work, Reinforcement Learning Solver for Vehicle Routing Problem (RL SolVeR Pro) is proposed wherein the optimal route learning problem is cast as a Markov Decision Process (MDP). The curse of dimensionality of RL is also overcome by using two-phase solver with geometric clustering. Also, realistic simulation for VRP was used to validate the effectiveness and applicability of the proposed RL SolVeR Pro under various conditions and constraints. Our simulation results suggest that our proposed method is able to obtain better or same level of results, compared to the two best-known heuristics: Clarke-Wright Savings and Sweep Heuristic. The proposed RL Solver can be applied to other variants of the VRP and has the potential to be applied more generally to other combinatorial optimization problems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

RL SolVeR Pro:用于解决车辆路线问题的强化学习

车辆路径问题(VRP)是一个众所周知的NP-hard组合优化问题，是交通物流研究的核心问题。传统方法只能精确地解决VRP问题的小实例。传统上，这个问题是使用启发式方法来解决大型实例的，即使没有最优性的保证。采用VRP的有效解决方案可以在大型运输和物流系统中每年节省大量资金。最近使用强化学习的许多工作都是计算密集型的，并且面临着三个维度的诅咒:状态和动作空间的爆炸以及高随机性，即给定状态动作对的大量可能的下一个状态。此外，最近关于VRP的工作并没有考虑到客户环境的现实模拟设置，随机元素和可伸缩性方面，因为它们只使用最多100个客户的标准Solomon基准实例。在这项工作中，提出了车辆路径问题的强化学习求解器(RL Solver Pro)，其中最优路径学习问题被转换为马尔可夫决策过程(MDP)。采用几何聚类的两相求解器克服了RL的维数问题。通过VRP仿真，验证了所提出的RL SolVeR Pro在各种条件和约束下的有效性和适用性。我们的模拟结果表明，与Clarke-Wright Savings和Sweep Heuristic这两种最著名的启发式方法相比，我们提出的方法能够获得更好或相同水平的结果。所提出的RL求解器可以应用于VRP的其他变体，并且有可能更广泛地应用于其他组合优化问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)

自引率

0.00%

发文量