Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

IF 7.2 4区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-01-25 DOI:10.1145/3625232
Zefang Zong, Tong Xia, Meng Zheng, Yong Li
{"title":"Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window","authors":"Zefang Zong, Tong Xia, Meng Zheng, Yong Li","doi":"10.1145/3625232","DOIUrl":null,"url":null,"abstract":"<p>Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \\(11.7\\% \\) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"161 1","pages":""},"PeriodicalIF":7.2000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3625232","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \(11.7\% \) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用强化学习解决带时间窗口的多车路由问题
带时间窗口的车辆路由问题(VRPTW)对于网络外卖和打车平台等广泛的服务和现实应用具有重要意义。一个有前途的方法应在有限的推理时间内生成高质量的解决方案,而目前存在三大挑战:a) 在多个实际约束条件下直接优化目标;b) 高效处理单个时间窗口限制;c) 对车队之间的合作进行建模。在本文中,我们提出了一个端到端的强化学习框架来解决 VRPTW。首先,我们提出了一个代理模型,该模型将约束条件编码成特征作为输入,并在生成确定性结果时对输出执行苛刻策略。其次,我们设计了一种时间惩罚增强奖励,以模拟梯度传播过程中的时间窗口限制。第三,我们设计了一个任务处理程序,以实现不同车辆之间的合作。我们在两个真实世界数据集和一个公共基准数据集上进行了大量实验。结果表明,与其他 RL 基线相比,我们的解决方案提高了高达(11.7%)的性能,并能在数秒内为实例生成解决方案,而现有的启发式基线则需要数分钟,同时还能保持解决方案的质量。此外,由于实时响应能力,我们的解决方案得到了全面的分析,并产生了有意义的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
9.30
自引率
2.00%
发文量
131
期刊介绍: ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.
期刊最新文献
Aspect-enhanced Explainable Recommendation with Multi-modal Contrastive Learning The Social Cognition Ability Evaluation of LLMs: A Dynamic Gamified Assessment and Hierarchical Social Learning Measurement Approach Explaining Neural News Recommendation with Attributions onto Reading Histories Misinformation Resilient Search Rankings with Webgraph-based Interventions Privacy-Preserving and Diversity-Aware Trust-based Team Formation in Online Social Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1