{"title":"Dynamic Demand Management for Parcel Lockers","authors":"Daniela Sailer, Robert Klein, Claudius Steinhardt","doi":"arxiv-2409.05061","DOIUrl":null,"url":null,"abstract":"In pursuit of a more sustainable and cost-efficient last mile, parcel lockers\nhave gained a firm foothold in the parcel delivery landscape. To fully exploit\ntheir potential and simultaneously ensure customer satisfaction, successful\nmanagement of the locker's limited capacity is crucial. This is challenging as\nfuture delivery requests and pickup times are stochastic from the provider's\nperspective. In response, we propose to dynamically control whether the locker\nis presented as an available delivery option to each incoming customer with the\ngoal of maximizing the number of served requests weighted by their priority.\nAdditionally, we take different compartment sizes into account, which entails a\nsecond type of decision as parcels scheduled for delivery must be allocated. We\nformalize the problem as an infinite-horizon sequential decision problem and\nfind that exact methods are intractable due to the curses of dimensionality. In\nlight of this, we develop a solution framework that orchestrates multiple\nalgorithmic techniques rooted in Sequential Decision Analytics and\nReinforcement Learning, namely cost function approximation and an offline\ntrained parametric value function approximation together with a truncated\nonline rollout. Our innovative approach to combine these techniques enables us\nto address the strong interrelations between the two decision types. As a\ngeneral methodological contribution, we enhance the training of our value\nfunction approximation with a modified version of experience replay that\nenforces structure in the value function. Our computational study shows that\nour method outperforms a myopic benchmark by 13.7% and an industry-inspired\npolicy by 12.6%.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In pursuit of a more sustainable and cost-efficient last mile, parcel lockers
have gained a firm foothold in the parcel delivery landscape. To fully exploit
their potential and simultaneously ensure customer satisfaction, successful
management of the locker's limited capacity is crucial. This is challenging as
future delivery requests and pickup times are stochastic from the provider's
perspective. In response, we propose to dynamically control whether the locker
is presented as an available delivery option to each incoming customer with the
goal of maximizing the number of served requests weighted by their priority.
Additionally, we take different compartment sizes into account, which entails a
second type of decision as parcels scheduled for delivery must be allocated. We
formalize the problem as an infinite-horizon sequential decision problem and
find that exact methods are intractable due to the curses of dimensionality. In
light of this, we develop a solution framework that orchestrates multiple
algorithmic techniques rooted in Sequential Decision Analytics and
Reinforcement Learning, namely cost function approximation and an offline
trained parametric value function approximation together with a truncated
online rollout. Our innovative approach to combine these techniques enables us
to address the strong interrelations between the two decision types. As a
general methodological contribution, we enhance the training of our value
function approximation with a modified version of experience replay that
enforces structure in the value function. Our computational study shows that
our method outperforms a myopic benchmark by 13.7% and an industry-inspired
policy by 12.6%.