{"title":"用强化学习解决带时间窗口的多车路由问题","authors":"Zefang Zong, Tong Xia, Meng Zheng, Yong Li","doi":"10.1145/3625232","DOIUrl":null,"url":null,"abstract":"<p>Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \\(11.7\\% \\) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"161 1","pages":""},"PeriodicalIF":7.2000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window\",\"authors\":\"Zefang Zong, Tong Xia, Meng Zheng, Yong Li\",\"doi\":\"10.1145/3625232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \\\\(11.7\\\\% \\\\) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.</p>\",\"PeriodicalId\":48967,\"journal\":{\"name\":\"ACM Transactions on Intelligent Systems and Technology\",\"volume\":\"161 1\",\"pages\":\"\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Intelligent Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3625232\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3625232","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window
Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \(11.7\% \) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.
期刊介绍:
ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world.
ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.