基于深度强化学习的多任务多目标进化搜索,适用于带时间窗口的多目标车辆路由问题

Symmetry Pub Date : 2024-08-12 DOI:10.3390/sym16081030
Jianjun Deng, Junjie Wang, Xiaojun Wang, Yiqiao Cai, Peizhong Liu
{"title":"基于深度强化学习的多任务多目标进化搜索,适用于带时间窗口的多目标车辆路由问题","authors":"Jianjun Deng, Junjie Wang, Xiaojun Wang, Yiqiao Cai, Peizhong Liu","doi":"10.3390/sym16081030","DOIUrl":null,"url":null,"abstract":"The vehicle routing problem with time windows (VRPTW) is a widely studied combinatorial optimization problem in supply chains and logistics within the last decade. Recent research has explored the potential of deep reinforcement learning (DRL) as a promising solution for the VRPTW. However, the challenge of addressing the VRPTW with many conflicting objectives (MOVRPTW) still remains for DRL. The MOVRPTW considers five conflicting objectives simultaneously: minimizing the number of vehicles required, the total travel distance, the travel time of the longest route, the total waiting time for early arrivals, and the total delay time for late arrivals. To tackle the MOVRPTW, this study introduces the MTMO/DRP-AT, a multi-task multi-objective evolutionary search algorithm, by making full use of both DRL and the multitasking mechanism. In the MTMO/DRL-AT, a two-objective MOVRPTW is constructed as an assisted task, with the objectives being to minimize the total travel distance and the travel time of the longest route. Both the main task and the assisted task are simultaneously solved in a multitasking scenario. Each task is decomposed into scalar optimization subproblems, which are then solved by an attention model trained using DRL. The outputs of these trained models serve as the initial solutions for the MTMO/DRL-AT. Subsequently, the proposed algorithm incorporates knowledge transfer and multiple local search operators to further enhance the quality of these promising solutions. The simulation results on real-world benchmarks highlight the superior performance of the MTMO/DRL-AT compared to several other algorithms in solving the MOVRPTW.","PeriodicalId":501198,"journal":{"name":"Symmetry","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows\",\"authors\":\"Jianjun Deng, Junjie Wang, Xiaojun Wang, Yiqiao Cai, Peizhong Liu\",\"doi\":\"10.3390/sym16081030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The vehicle routing problem with time windows (VRPTW) is a widely studied combinatorial optimization problem in supply chains and logistics within the last decade. Recent research has explored the potential of deep reinforcement learning (DRL) as a promising solution for the VRPTW. However, the challenge of addressing the VRPTW with many conflicting objectives (MOVRPTW) still remains for DRL. The MOVRPTW considers five conflicting objectives simultaneously: minimizing the number of vehicles required, the total travel distance, the travel time of the longest route, the total waiting time for early arrivals, and the total delay time for late arrivals. To tackle the MOVRPTW, this study introduces the MTMO/DRP-AT, a multi-task multi-objective evolutionary search algorithm, by making full use of both DRL and the multitasking mechanism. In the MTMO/DRL-AT, a two-objective MOVRPTW is constructed as an assisted task, with the objectives being to minimize the total travel distance and the travel time of the longest route. Both the main task and the assisted task are simultaneously solved in a multitasking scenario. Each task is decomposed into scalar optimization subproblems, which are then solved by an attention model trained using DRL. The outputs of these trained models serve as the initial solutions for the MTMO/DRL-AT. Subsequently, the proposed algorithm incorporates knowledge transfer and multiple local search operators to further enhance the quality of these promising solutions. The simulation results on real-world benchmarks highlight the superior performance of the MTMO/DRL-AT compared to several other algorithms in solving the MOVRPTW.\",\"PeriodicalId\":501198,\"journal\":{\"name\":\"Symmetry\",\"volume\":\"59 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symmetry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/sym16081030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symmetry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/sym16081030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

有时间窗口的车辆路由问题(VRPTW)是近十年来供应链和物流领域广泛研究的组合优化问题。最近的研究探索了深度强化学习(DRL)作为 VRPTW 解决方案的潜力。然而,对于 DRL 来说,解决具有多个冲突目标的 VRPTW(MOVRPTW)问题仍然是一个挑战。MOVRPTW 同时考虑五个相互冲突的目标:所需车辆数最小化、总行程距离最小化、最长路线的行程时间最小化、早到车辆的总等待时间最小化和晚到车辆的总延误时间最小化。针对 MOVRPTW,本研究充分利用 DRL 和多任务机制,引入了多任务多目标进化搜索算法 MTMO/DRP-AT。在 MTMO/DRL-AT 中,双目标 MOVRPTW 被构建为辅助任务,其目标是最小化总行程距离和最长路线的行程时间。在多任务情况下,主任务和辅助任务同时求解。每个任务都被分解成标量优化子问题,然后由使用 DRL 训练的注意力模型来解决。这些经过训练的模型的输出可作为 MTMO/DRL-AT 的初始解。随后,提议的算法结合了知识转移和多个局部搜索算子,以进一步提高这些有前途的解决方案的质量。在真实世界基准上的仿真结果表明,与其他几种算法相比,MTMO/DRL-AT 在求解 MOVRPTW 方面表现出色。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows
The vehicle routing problem with time windows (VRPTW) is a widely studied combinatorial optimization problem in supply chains and logistics within the last decade. Recent research has explored the potential of deep reinforcement learning (DRL) as a promising solution for the VRPTW. However, the challenge of addressing the VRPTW with many conflicting objectives (MOVRPTW) still remains for DRL. The MOVRPTW considers five conflicting objectives simultaneously: minimizing the number of vehicles required, the total travel distance, the travel time of the longest route, the total waiting time for early arrivals, and the total delay time for late arrivals. To tackle the MOVRPTW, this study introduces the MTMO/DRP-AT, a multi-task multi-objective evolutionary search algorithm, by making full use of both DRL and the multitasking mechanism. In the MTMO/DRL-AT, a two-objective MOVRPTW is constructed as an assisted task, with the objectives being to minimize the total travel distance and the travel time of the longest route. Both the main task and the assisted task are simultaneously solved in a multitasking scenario. Each task is decomposed into scalar optimization subproblems, which are then solved by an attention model trained using DRL. The outputs of these trained models serve as the initial solutions for the MTMO/DRL-AT. Subsequently, the proposed algorithm incorporates knowledge transfer and multiple local search operators to further enhance the quality of these promising solutions. The simulation results on real-world benchmarks highlight the superior performance of the MTMO/DRL-AT compared to several other algorithms in solving the MOVRPTW.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Three-Dimensional Moran Walk with Resets The Optimization of Aviation Technologies and Design Strategies for a Carbon-Neutral Future A Channel-Sensing-Based Multipath Multihop Cooperative Transmission Mechanism for UE Aggregation in Asymmetric IoE Scenarios A New Multimodal Modification of the Skew Family of Distributions: Properties and Applications to Medical and Environmental Data Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1