A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests

IF 6.7 1区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Industrial Engineering Pub Date : 2025-02-01 DOI:10.1016/j.cie.2024.110747

Fabian Akkerman , Martijn Mes , Willem van Jaarsveld

{"title":"A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests","authors":"Fabian Akkerman , Martijn Mes , Willem van Jaarsveld","doi":"10.1016/j.cie.2024.110747","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents directions for using reinforcement learning with neural networks for dynamic vehicle routing problems (DVRPs). DVRPs involve sequential decision-making under uncertainty where the expected future consequences are ideally included in current decision-making. A frequently used framework for these problems is approximate dynamic programming (ADP) or reinforcement learning (RL), often in conjunction with a parametric value function approximation (VFA). A straightforward way to use VFA in DVRP is linear regression (LVFA), but more complex, non-linear predictors, e.g., neural network VFAs (NNVFA), are also widely used. Alternatively, we may represent the policy directly, using a linear policy function approximation (LPFA) or neural network PFA (NNPFA). The abundance of policies and design choices complicate the use of neural networks for DVRPs in research and practice. We provide a structured overview of the similarities and differences between the policy classes. Furthermore, we present an empirical comparison of LVFA, LPFA, NNVFA, and NNPFA policies. The comparison is conducted on several problem variants of the DVRP with stochastic customer requests. To validate our findings, we study realistic extensions of the stylized problem on (i) a same-day parcel pickup and delivery case in the city of Amsterdam, the Netherlands, and (ii) the routing of robots in an automated storage and retrieval system (AS/RS). Based on our empirical evaluation, we provide insights into the advantages and disadvantages of neural network policies compared to linear policies, and value-based approaches compared to policy-based approaches.</div></div>","PeriodicalId":55220,"journal":{"name":"Computers & Industrial Engineering","volume":"200 ","pages":"Article 110747"},"PeriodicalIF":6.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Industrial Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360835224008696","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents directions for using reinforcement learning with neural networks for dynamic vehicle routing problems (DVRPs). DVRPs involve sequential decision-making under uncertainty where the expected future consequences are ideally included in current decision-making. A frequently used framework for these problems is approximate dynamic programming (ADP) or reinforcement learning (RL), often in conjunction with a parametric value function approximation (VFA). A straightforward way to use VFA in DVRP is linear regression (LVFA), but more complex, non-linear predictors, e.g., neural network VFAs (NNVFA), are also widely used. Alternatively, we may represent the policy directly, using a linear policy function approximation (LPFA) or neural network PFA (NNPFA). The abundance of policies and design choices complicate the use of neural networks for DVRPs in research and practice. We provide a structured overview of the similarities and differences between the policy classes. Furthermore, we present an empirical comparison of LVFA, LPFA, NNVFA, and NNPFA policies. The comparison is conducted on several problem variants of the DVRP with stochastic customer requests. To validate our findings, we study realistic extensions of the stylized problem on (i) a same-day parcel pickup and delivery case in the city of Amsterdam, the Netherlands, and (ii) the routing of robots in an automated storage and retrieval system (AS/RS). Based on our empirical evaluation, we provide insights into the advantages and disadvantages of neural network policies compared to linear policies, and value-based approaches compared to policy-based approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Industrial Engineering 工程技术-工程：工业

CiteScore

12.70

自引率

12.70%

发文量

794

审稿时长

10.6 months

期刊介绍： Computers & Industrial Engineering (CAIE) is dedicated to researchers, educators, and practitioners in industrial engineering and related fields. Pioneering the integration of computers in research, education, and practice, industrial engineering has evolved to make computers and electronic communication integral to its domain. CAIE publishes original contributions focusing on the development of novel computerized methodologies to address industrial engineering problems. It also highlights the applications of these methodologies to issues within the broader industrial engineering and associated communities. The journal actively encourages submissions that push the boundaries of fundamental theories and concepts in industrial engineering techniques.