Amirreza Farahani, Laura Genga, Albert H. Schrotenboer, Remco Dijkman
{"title":"Capacity planning in logistics corridors: Deep reinforcement learning for the dynamic stochastic temporal bin packing problem","authors":"Amirreza Farahani, Laura Genga, Albert H. Schrotenboer, Remco Dijkman","doi":"10.1016/j.tre.2024.103742","DOIUrl":null,"url":null,"abstract":"<div><p>This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container <em>delays</em>, and the continuous-time, or <em>dynamic</em>, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.</p></div>","PeriodicalId":49418,"journal":{"name":"Transportation Research Part E-Logistics and Transportation Review","volume":null,"pages":null},"PeriodicalIF":8.3000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1366554524003338/pdfft?md5=772c954521a957892fdb831dda89545d&pid=1-s2.0-S1366554524003338-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part E-Logistics and Transportation Review","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1366554524003338","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper addresses the challenge of managing uncertainty in the daily capacity planning of a terminal in a corridor-based logistics system. Corridor-based logistics systems facilitate the exchange of freight between two distinct regions, usually involving industrial and logistics clusters. In this context, we introduce the dynamic stochastic temporal bin packing problem. It models the assignment of individual containers to carriers’ trucks over discrete time units in real-time. We formulate it as a Markov decision process (MDP). Two distinguishing characteristics of our problem are the stochastic nature of the time-dependent availability of containers, i.e., container delays, and the continuous-time, or dynamic, aspect of the planning, where a container announcement may occur at any time moment during the planning horizon. We introduce an innovative real-time planning algorithm based on Proximal Policy Optimization (PPO), a Deep Reinforcement Learning (DRL) method, to allocate individual containers to eligible carriers in real-time. In addition, we propose some practical heuristics and two novel rolling-horizon batch-planning methods based on (stochastic) mixed-integer programming (MIP), which can be interpreted as computational information relaxation bounds because they delay decision making. The results show that our proposed DRL method outperforms the practical heuristics and effectively scales to larger-sized problems as opposed to the stochastic MIP-based approach, making our DRL method a practically appealing solution.
期刊介绍:
Transportation Research Part E: Logistics and Transportation Review is a reputable journal that publishes high-quality articles covering a wide range of topics in the field of logistics and transportation research. The journal welcomes submissions on various subjects, including transport economics, transport infrastructure and investment appraisal, evaluation of public policies related to transportation, empirical and analytical studies of logistics management practices and performance, logistics and operations models, and logistics and supply chain management.
Part E aims to provide informative and well-researched articles that contribute to the understanding and advancement of the field. The content of the journal is complementary to other prestigious journals in transportation research, such as Transportation Research Part A: Policy and Practice, Part B: Methodological, Part C: Emerging Technologies, Part D: Transport and Environment, and Part F: Traffic Psychology and Behaviour. Together, these journals form a comprehensive and cohesive reference for current research in transportation science.