神经自适应大邻域搜索的图强化学习框架

IF 4.1 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Operations Research Pub Date : 2024-08-02 DOI:10.1016/j.cor.2024.106791

Syu-Ning Johnn , Victor-Alexandru Darvariu , Julia Handl , Jörg Kalcsics

{"title":"神经自适应大邻域搜索的图强化学习框架","authors":"Syu-Ning Johnn , Victor-Alexandru Darvariu , Julia Handl , Jörg Kalcsics","doi":"10.1016/j.cor.2024.106791","DOIUrl":null,"url":null,"abstract":"<div><p>Adaptive Large Neighbourhood Search (ALNS) is a popular metaheuristic with renowned efficiency in solving combinatorial optimisation problems. However, despite 18 years of intensive research into ALNS, the design of an effective adaptive layer for selecting operators to improve the solution remains an open question. In this work, we isolate this problem by formulating it as a Markov Decision Process, in which an agent is rewarded proportionally to the improvement of the incumbent. We propose Graph Reinforcement Learning for Operator Selection (GRLOS), a method based on Deep Reinforcement Learning and Graph Neural Networks, as well as Learned Roulette Wheel (LRW), a lightweight approach inspired by the classic Roulette Wheel adaptive layer. The methods, which are broadly applicable to optimisation problems that can be represented as graphs, are comprehensively evaluated on 5 routing problems using a large portfolio of 28 destroy and 7 repair operators. Results show that both GRLOS and LRW outperform the classic selection mechanism in ALNS, owing to the operator choices being learned in a prior training phase. GRLOS is also shown to consistently achieve better performance than a recent Deep Reinforcement Learning method due to its substantially more flexible state representation. The evaluation further examines the impact of the operator budget and type of initial solution, and is applied to problem instances with up to 1000 customers. The findings arising from our extensive benchmarking bear relevance to the wider literature of hybrid methods combining metaheuristics and machine learning.</p></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"172 ","pages":"Article 106791"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0305054824002636/pdfft?md5=03a7927599315f8e665c819357d5a172&pid=1-s2.0-S0305054824002636-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A Graph Reinforcement Learning Framework for Neural Adaptive Large Neighbourhood Search\",\"authors\":\"Syu-Ning Johnn , Victor-Alexandru Darvariu , Julia Handl , Jörg Kalcsics\",\"doi\":\"10.1016/j.cor.2024.106791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Adaptive Large Neighbourhood Search (ALNS) is a popular metaheuristic with renowned efficiency in solving combinatorial optimisation problems. However, despite 18 years of intensive research into ALNS, the design of an effective adaptive layer for selecting operators to improve the solution remains an open question. In this work, we isolate this problem by formulating it as a Markov Decision Process, in which an agent is rewarded proportionally to the improvement of the incumbent. We propose Graph Reinforcement Learning for Operator Selection (GRLOS), a method based on Deep Reinforcement Learning and Graph Neural Networks, as well as Learned Roulette Wheel (LRW), a lightweight approach inspired by the classic Roulette Wheel adaptive layer. The methods, which are broadly applicable to optimisation problems that can be represented as graphs, are comprehensively evaluated on 5 routing problems using a large portfolio of 28 destroy and 7 repair operators. Results show that both GRLOS and LRW outperform the classic selection mechanism in ALNS, owing to the operator choices being learned in a prior training phase. GRLOS is also shown to consistently achieve better performance than a recent Deep Reinforcement Learning method due to its substantially more flexible state representation. The evaluation further examines the impact of the operator budget and type of initial solution, and is applied to problem instances with up to 1000 customers. The findings arising from our extensive benchmarking bear relevance to the wider literature of hybrid methods combining metaheuristics and machine learning.</p></div>\",\"PeriodicalId\":10542,\"journal\":{\"name\":\"Computers & Operations Research\",\"volume\":\"172 \",\"pages\":\"Article 106791\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0305054824002636/pdfft?md5=03a7927599315f8e665c819357d5a172&pid=1-s2.0-S0305054824002636-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Operations Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0305054824002636\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054824002636","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

自适应大邻域搜索（ALNS）是一种流行的元启发式方法，在解决组合优化问题方面效率极高。然而，尽管对 ALNS 进行了长达 18 年的深入研究，但如何设计一个有效的自适应层来选择算子以改进解决方案，仍然是一个悬而未决的问题。在这项工作中，我们将这一问题孤立出来，将其表述为马尔可夫决策过程，在这一过程中，操作员的奖励与现任操作员的改进成正比。我们提出了基于深度强化学习（Deep Reinforcement Learning）和图神经网络（Graph Neural Networks）的 "操作员选择图强化学习"（Graph Reinforcement Learning for Operator Selection，GRLOS）方法，以及受经典轮盘自适应层启发的轻量级方法 "学习轮盘"（Learned Roulette Wheel，LRW）。这些方法广泛适用于可表示为图的优化问题，并在 5 个路由问题上使用 28 个破坏和 7 个修复算子的大型组合进行了全面评估。结果表明，GRLOS 和 LRW 均优于 ALNS 中的经典选择机制，这是因为操作员的选择是在之前的训练阶段学习的。此外，由于 GRLOS 的状态表示法更加灵活，因此它的性能始终优于最新的深度强化学习方法。评估进一步考察了操作员预算和初始解决方案类型的影响，并应用于多达 1000 个客户的问题实例。我们通过广泛的基准测试得出的结论与更广泛的元启发式和机器学习相结合的混合方法相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Graph Reinforcement Learning Framework for Neural Adaptive Large Neighbourhood Search

Adaptive Large Neighbourhood Search (ALNS) is a popular metaheuristic with renowned efficiency in solving combinatorial optimisation problems. However, despite 18 years of intensive research into ALNS, the design of an effective adaptive layer for selecting operators to improve the solution remains an open question. In this work, we isolate this problem by formulating it as a Markov Decision Process, in which an agent is rewarded proportionally to the improvement of the incumbent. We propose Graph Reinforcement Learning for Operator Selection (GRLOS), a method based on Deep Reinforcement Learning and Graph Neural Networks, as well as Learned Roulette Wheel (LRW), a lightweight approach inspired by the classic Roulette Wheel adaptive layer. The methods, which are broadly applicable to optimisation problems that can be represented as graphs, are comprehensively evaluated on 5 routing problems using a large portfolio of 28 destroy and 7 repair operators. Results show that both GRLOS and LRW outperform the classic selection mechanism in ALNS, owing to the operator choices being learned in a prior training phase. GRLOS is also shown to consistently achieve better performance than a recent Deep Reinforcement Learning method due to its substantially more flexible state representation. The evaluation further examines the impact of the operator budget and type of initial solution, and is applied to problem instances with up to 1000 customers. The findings arising from our extensive benchmarking bear relevance to the wider literature of hybrid methods combining metaheuristics and machine learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Operations Research 工程技术-工程：工业

CiteScore

8.60

自引率

8.70%

发文量

292

审稿时长

8.5 months

期刊介绍： Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.