Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows

Symmetry Pub Date : 2024-08-12 DOI:10.3390/sym16081030

Jianjun Deng, Junjie Wang, Xiaojun Wang, Yiqiao Cai, Peizhong Liu

{"title":"Multi-Task Multi-Objective Evolutionary Search Based on Deep Reinforcement Learning for Multi-Objective Vehicle Routing Problems with Time Windows","authors":"Jianjun Deng, Junjie Wang, Xiaojun Wang, Yiqiao Cai, Peizhong Liu","doi":"10.3390/sym16081030","DOIUrl":null,"url":null,"abstract":"The vehicle routing problem with time windows (VRPTW) is a widely studied combinatorial optimization problem in supply chains and logistics within the last decade. Recent research has explored the potential of deep reinforcement learning (DRL) as a promising solution for the VRPTW. However, the challenge of addressing the VRPTW with many conflicting objectives (MOVRPTW) still remains for DRL. The MOVRPTW considers five conflicting objectives simultaneously: minimizing the number of vehicles required, the total travel distance, the travel time of the longest route, the total waiting time for early arrivals, and the total delay time for late arrivals. To tackle the MOVRPTW, this study introduces the MTMO/DRP-AT, a multi-task multi-objective evolutionary search algorithm, by making full use of both DRL and the multitasking mechanism. In the MTMO/DRL-AT, a two-objective MOVRPTW is constructed as an assisted task, with the objectives being to minimize the total travel distance and the travel time of the longest route. Both the main task and the assisted task are simultaneously solved in a multitasking scenario. Each task is decomposed into scalar optimization subproblems, which are then solved by an attention model trained using DRL. The outputs of these trained models serve as the initial solutions for the MTMO/DRL-AT. Subsequently, the proposed algorithm incorporates knowledge transfer and multiple local search operators to further enhance the quality of these promising solutions. The simulation results on real-world benchmarks highlight the superior performance of the MTMO/DRL-AT compared to several other algorithms in solving the MOVRPTW.","PeriodicalId":501198,"journal":{"name":"Symmetry","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symmetry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/sym16081030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The vehicle routing problem with time windows (VRPTW) is a widely studied combinatorial optimization problem in supply chains and logistics within the last decade. Recent research has explored the potential of deep reinforcement learning (DRL) as a promising solution for the VRPTW. However, the challenge of addressing the VRPTW with many conflicting objectives (MOVRPTW) still remains for DRL. The MOVRPTW considers five conflicting objectives simultaneously: minimizing the number of vehicles required, the total travel distance, the travel time of the longest route, the total waiting time for early arrivals, and the total delay time for late arrivals. To tackle the MOVRPTW, this study introduces the MTMO/DRP-AT, a multi-task multi-objective evolutionary search algorithm, by making full use of both DRL and the multitasking mechanism. In the MTMO/DRL-AT, a two-objective MOVRPTW is constructed as an assisted task, with the objectives being to minimize the total travel distance and the travel time of the longest route. Both the main task and the assisted task are simultaneously solved in a multitasking scenario. Each task is decomposed into scalar optimization subproblems, which are then solved by an attention model trained using DRL. The outputs of these trained models serve as the initial solutions for the MTMO/DRL-AT. Subsequently, the proposed algorithm incorporates knowledge transfer and multiple local search operators to further enhance the quality of these promising solutions. The simulation results on real-world benchmarks highlight the superior performance of the MTMO/DRL-AT compared to several other algorithms in solving the MOVRPTW.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度强化学习的多任务多目标进化搜索，适用于带时间窗口的多目标车辆路由问题

有时间窗口的车辆路由问题（VRPTW）是近十年来供应链和物流领域广泛研究的组合优化问题。最近的研究探索了深度强化学习（DRL）作为 VRPTW 解决方案的潜力。然而，对于 DRL 来说，解决具有多个冲突目标的 VRPTW（MOVRPTW）问题仍然是一个挑战。MOVRPTW 同时考虑五个相互冲突的目标：所需车辆数最小化、总行程距离最小化、最长路线的行程时间最小化、早到车辆的总等待时间最小化和晚到车辆的总延误时间最小化。针对 MOVRPTW，本研究充分利用 DRL 和多任务机制，引入了多任务多目标进化搜索算法 MTMO/DRP-AT。在 MTMO/DRL-AT 中，双目标 MOVRPTW 被构建为辅助任务，其目标是最小化总行程距离和最长路线的行程时间。在多任务情况下，主任务和辅助任务同时求解。每个任务都被分解成标量优化子问题，然后由使用 DRL 训练的注意力模型来解决。这些经过训练的模型的输出可作为 MTMO/DRL-AT 的初始解。随后，提议的算法结合了知识转移和多个局部搜索算子，以进一步提高这些有前途的解决方案的质量。在真实世界基准上的仿真结果表明，与其他几种算法相比，MTMO/DRL-AT 在求解 MOVRPTW 方面表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Symmetry

自引率

0.00%

发文量