Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

IF 4.4 2区 工程技术 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE Transportation Science Pub Date : 2024-07-18 DOI:10.1287/trsc.2022.0366
Yuanyuan Li, Claudia Archetti, Ivana Ljubić
{"title":"Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates","authors":"Yuanyuan Li, Claudia Archetti, Ivana Ljubić","doi":"10.1287/trsc.2022.0366","DOIUrl":null,"url":null,"abstract":"In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte Carlo fashion, and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation: One combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyperparameters and make good use of integer linear programming (ILP) and branch-and-cut–based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that (1) do not rely on future information (2) are based on point estimation of future information, (3) use heuristics rather than exact methods, or (4) use exact evaluations of future rewards.Funding: This work was supported by the CY Initiative of Excellence [ANR-16- IDEX-0008].Supplemental Material: The online appendices are available at https://doi.org/10.1287/trsc.2022.0366 .","PeriodicalId":51202,"journal":{"name":"Transportation Science","volume":"52 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Science","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1287/trsc.2022.0366","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte Carlo fashion, and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation: One combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyperparameters and make good use of integer linear programming (ILP) and branch-and-cut–based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that (1) do not rely on future information (2) are based on point estimation of future information, (3) use heuristics rather than exact methods, or (4) use exact evaluations of future rewards.Funding: This work was supported by the CY Initiative of Excellence [ANR-16- IDEX-0008].Supplemental Material: The online appendices are available at https://doi.org/10.1287/trsc.2022.0366 .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
随机和动态发布日期定向问题的强化学习方法
在本文中,我们研究了电子商务承运商面临的一个顺序决策问题,即在假设包裹到达仓库的时间是随机和动态的情况下,何时从中心仓库派出车辆为客户提供服务,以及以何种顺序提供服务。我们的目标是最大化服务时间内可交付包裹的预期数量。我们提出了两种强化学习 (RL) 方法来解决这个问题。这些方法依赖于一种前瞻性策略,即以蒙特卡洛方式对未来的投递日期进行采样,并使用批量方法来近似未来的路线。这两种 RL 方法都基于值函数近似:一种是将其与共识函数相结合(VFA-CF),另一种是与两阶段随机整数线性规划模型相结合(VFA-2S)。VFA-CF 和 VFA-2S 不需要大量训练,因为它们基于极少的超参数,并能很好地利用整数线性规划 (ILP) 和基于分支切割的精确方法来提高决策质量。我们还为最优策略的部分表征建立了充分条件,并将其集成到 VFA-CF/VFA-2S 中。在实证研究中,我们利用完全信息的上界进行了竞争分析。我们还表明,VFA-CF 和 VFA-2S 大大优于以下替代方法:(1) 不依赖未来信息;(2) 基于对未来信息的点估计;(3) 使用启发式而非精确方法;或 (4) 使用对未来回报的精确评估:这项工作得到了 CY 卓越计划 [ANR-16- IDEX-0008] 的支持:在线附录见 https://doi.org/10.1287/trsc.2022.0366 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Transportation Science
Transportation Science 工程技术-运筹学与管理科学
CiteScore
8.30
自引率
10.90%
发文量
111
审稿时长
12 months
期刊介绍: Transportation Science, published quarterly by INFORMS, is the flagship journal of the Transportation Science and Logistics Society of INFORMS. As the foremost scientific journal in the cross-disciplinary operational research field of transportation analysis, Transportation Science publishes high-quality original contributions and surveys on phenomena associated with all modes of transportation, present and prospective, including mainly all levels of planning, design, economic, operational, and social aspects. Transportation Science focuses primarily on fundamental theories, coupled with observational and experimental studies of transportation and logistics phenomena and processes, mathematical models, advanced methodologies and novel applications in transportation and logistics systems analysis, planning and design. The journal covers a broad range of topics that include vehicular and human traffic flow theories, models and their application to traffic operations and management, strategic, tactical, and operational planning of transportation and logistics systems; performance analysis methods and system design and optimization; theories and analysis methods for network and spatial activity interaction, equilibrium and dynamics; economics of transportation system supply and evaluation; methodologies for analysis of transportation user behavior and the demand for transportation and logistics services. Transportation Science is international in scope, with editors from nations around the globe. The editorial board reflects the diverse interdisciplinary interests of the transportation science and logistics community, with members that hold primary affiliations in engineering (civil, industrial, and aeronautical), physics, economics, applied mathematics, and business.
期刊最新文献
CARMA: Fair and Efficient Bottleneck Congestion Management via Nontradable Karma Credits Genetic Algorithms with Neural Cost Predictor for Solving Hierarchical Vehicle Routing Problems On-Demand Meal Delivery: A Markov Model for Circulating Couriers Physics-Informed Machine Learning for Calibrating Macroscopic Traffic Flow Models Heatmap Design for Probabilistic Driver Repositioning in Crowdsourced Delivery
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1