神经自适应大邻域搜索的图强化学习框架

IF 4.1 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Operations Research Pub Date : 2024-08-02 DOI:10.1016/j.cor.2024.106791
Syu-Ning Johnn , Victor-Alexandru Darvariu , Julia Handl , Jörg Kalcsics
{"title":"神经自适应大邻域搜索的图强化学习框架","authors":"Syu-Ning Johnn ,&nbsp;Victor-Alexandru Darvariu ,&nbsp;Julia Handl ,&nbsp;Jörg Kalcsics","doi":"10.1016/j.cor.2024.106791","DOIUrl":null,"url":null,"abstract":"<div><p>Adaptive Large Neighbourhood Search (ALNS) is a popular metaheuristic with renowned efficiency in solving combinatorial optimisation problems. However, despite 18 years of intensive research into ALNS, the design of an effective adaptive layer for selecting operators to improve the solution remains an open question. In this work, we isolate this problem by formulating it as a Markov Decision Process, in which an agent is rewarded proportionally to the improvement of the incumbent. We propose Graph Reinforcement Learning for Operator Selection (GRLOS), a method based on Deep Reinforcement Learning and Graph Neural Networks, as well as Learned Roulette Wheel (LRW), a lightweight approach inspired by the classic Roulette Wheel adaptive layer. The methods, which are broadly applicable to optimisation problems that can be represented as graphs, are comprehensively evaluated on 5 routing problems using a large portfolio of 28 destroy and 7 repair operators. Results show that both GRLOS and LRW outperform the classic selection mechanism in ALNS, owing to the operator choices being learned in a prior training phase. GRLOS is also shown to consistently achieve better performance than a recent Deep Reinforcement Learning method due to its substantially more flexible state representation. The evaluation further examines the impact of the operator budget and type of initial solution, and is applied to problem instances with up to 1000 customers. The findings arising from our extensive benchmarking bear relevance to the wider literature of hybrid methods combining metaheuristics and machine learning.</p></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"172 ","pages":"Article 106791"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0305054824002636/pdfft?md5=03a7927599315f8e665c819357d5a172&pid=1-s2.0-S0305054824002636-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A Graph Reinforcement Learning Framework for Neural Adaptive Large Neighbourhood Search\",\"authors\":\"Syu-Ning Johnn ,&nbsp;Victor-Alexandru Darvariu ,&nbsp;Julia Handl ,&nbsp;Jörg Kalcsics\",\"doi\":\"10.1016/j.cor.2024.106791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Adaptive Large Neighbourhood Search (ALNS) is a popular metaheuristic with renowned efficiency in solving combinatorial optimisation problems. However, despite 18 years of intensive research into ALNS, the design of an effective adaptive layer for selecting operators to improve the solution remains an open question. In this work, we isolate this problem by formulating it as a Markov Decision Process, in which an agent is rewarded proportionally to the improvement of the incumbent. We propose Graph Reinforcement Learning for Operator Selection (GRLOS), a method based on Deep Reinforcement Learning and Graph Neural Networks, as well as Learned Roulette Wheel (LRW), a lightweight approach inspired by the classic Roulette Wheel adaptive layer. The methods, which are broadly applicable to optimisation problems that can be represented as graphs, are comprehensively evaluated on 5 routing problems using a large portfolio of 28 destroy and 7 repair operators. Results show that both GRLOS and LRW outperform the classic selection mechanism in ALNS, owing to the operator choices being learned in a prior training phase. GRLOS is also shown to consistently achieve better performance than a recent Deep Reinforcement Learning method due to its substantially more flexible state representation. The evaluation further examines the impact of the operator budget and type of initial solution, and is applied to problem instances with up to 1000 customers. The findings arising from our extensive benchmarking bear relevance to the wider literature of hybrid methods combining metaheuristics and machine learning.</p></div>\",\"PeriodicalId\":10542,\"journal\":{\"name\":\"Computers & Operations Research\",\"volume\":\"172 \",\"pages\":\"Article 106791\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0305054824002636/pdfft?md5=03a7927599315f8e665c819357d5a172&pid=1-s2.0-S0305054824002636-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Operations Research\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0305054824002636\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054824002636","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

自适应大邻域搜索(ALNS)是一种流行的元启发式方法,在解决组合优化问题方面效率极高。然而,尽管对 ALNS 进行了长达 18 年的深入研究,但如何设计一个有效的自适应层来选择算子以改进解决方案,仍然是一个悬而未决的问题。在这项工作中,我们将这一问题孤立出来,将其表述为马尔可夫决策过程,在这一过程中,操作员的奖励与现任操作员的改进成正比。我们提出了基于深度强化学习(Deep Reinforcement Learning)和图神经网络(Graph Neural Networks)的 "操作员选择图强化学习"(Graph Reinforcement Learning for Operator Selection,GRLOS)方法,以及受经典轮盘自适应层启发的轻量级方法 "学习轮盘"(Learned Roulette Wheel,LRW)。这些方法广泛适用于可表示为图的优化问题,并在 5 个路由问题上使用 28 个破坏和 7 个修复算子的大型组合进行了全面评估。结果表明,GRLOS 和 LRW 均优于 ALNS 中的经典选择机制,这是因为操作员的选择是在之前的训练阶段学习的。此外,由于 GRLOS 的状态表示法更加灵活,因此它的性能始终优于最新的深度强化学习方法。评估进一步考察了操作员预算和初始解决方案类型的影响,并应用于多达 1000 个客户的问题实例。我们通过广泛的基准测试得出的结论与更广泛的元启发式和机器学习相结合的混合方法相关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Graph Reinforcement Learning Framework for Neural Adaptive Large Neighbourhood Search

Adaptive Large Neighbourhood Search (ALNS) is a popular metaheuristic with renowned efficiency in solving combinatorial optimisation problems. However, despite 18 years of intensive research into ALNS, the design of an effective adaptive layer for selecting operators to improve the solution remains an open question. In this work, we isolate this problem by formulating it as a Markov Decision Process, in which an agent is rewarded proportionally to the improvement of the incumbent. We propose Graph Reinforcement Learning for Operator Selection (GRLOS), a method based on Deep Reinforcement Learning and Graph Neural Networks, as well as Learned Roulette Wheel (LRW), a lightweight approach inspired by the classic Roulette Wheel adaptive layer. The methods, which are broadly applicable to optimisation problems that can be represented as graphs, are comprehensively evaluated on 5 routing problems using a large portfolio of 28 destroy and 7 repair operators. Results show that both GRLOS and LRW outperform the classic selection mechanism in ALNS, owing to the operator choices being learned in a prior training phase. GRLOS is also shown to consistently achieve better performance than a recent Deep Reinforcement Learning method due to its substantially more flexible state representation. The evaluation further examines the impact of the operator budget and type of initial solution, and is applied to problem instances with up to 1000 customers. The findings arising from our extensive benchmarking bear relevance to the wider literature of hybrid methods combining metaheuristics and machine learning.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Operations Research
Computers & Operations Research 工程技术-工程:工业
CiteScore
8.60
自引率
8.70%
发文量
292
审稿时长
8.5 months
期刊介绍: Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.
期刊最新文献
Understand your decision rather than your model prescription: Towards explainable deep learning approaches for commodity procurement Airline recovery problem under disruptions: A review A decomposition scheme for Wasserstein distributionally robust emergency relief network design under demand uncertainty and social donations Scheduling AMSs with generalized Petri nets and highly informed heuristic search Efficient arc-flow formulations for makespan minimisation on parallel machines with a common server
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1