强化学习作为改进现实世界生产调度的启发式方法

Arthur Müller, Lukas Vollenkemper
{"title":"强化学习作为改进现实世界生产调度的启发式方法","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":null,"url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\nemerging trend for solving optimization problems, which leverages RL's ability\nto learn from the data generated during the search process. One promising\napproach is to train an RL agent as an improvement heuristic, starting with a\nsuboptimal solution that is iteratively improved by applying small changes. We\napply this approach to a real-world multiobjective production scheduling\nproblem. Our approach utilizes a network architecture that includes Transformer\nencoding to learn the relationships between jobs. Afterwards, a probability\nmatrix is generated from which pairs of jobs are sampled and then swapped to\nimprove the solution. We benchmarked our approach against other heuristics\nusing real data from our industry partner, demonstrating its superior\nperformance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling\",\"authors\":\"Arthur Müller, Lukas Vollenkemper\",\"doi\":\"arxiv-2409.11933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of Reinforcement Learning (RL) with heuristic methods is an\\nemerging trend for solving optimization problems, which leverages RL's ability\\nto learn from the data generated during the search process. One promising\\napproach is to train an RL agent as an improvement heuristic, starting with a\\nsuboptimal solution that is iteratively improved by applying small changes. We\\napply this approach to a real-world multiobjective production scheduling\\nproblem. Our approach utilizes a network architecture that includes Transformer\\nencoding to learn the relationships between jobs. Afterwards, a probability\\nmatrix is generated from which pairs of jobs are sampled and then swapped to\\nimprove the solution. We benchmarked our approach against other heuristics\\nusing real data from our industry partner, demonstrating its superior\\nperformance.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11933\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

强化学习(RL)与启发式方法的结合是解决优化问题的一个新兴趋势,它充分利用了 RL 从搜索过程中产生的数据中学习的能力。一种很有前景的方法是将 RL 代理作为改进启发式方法来训练,从次优解开始,通过应用微小的变化进行迭代改进。我们将这种方法应用于现实世界中的多目标生产调度问题。我们的方法采用了一种网络架构,其中包括转换编码(Transformerencoding)来学习工作之间的关系。之后,生成一个概率矩阵,从中抽取成对的工作,然后进行交换,以改进解决方案。我们利用行业合作伙伴提供的真实数据,将我们的方法与其他启发式方法进行了对比,证明了它的优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling
The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this approach to a real-world multiobjective production scheduling problem. Our approach utilizes a network architecture that includes Transformer encoding to learn the relationships between jobs. Afterwards, a probability matrix is generated from which pairs of jobs are sampled and then swapped to improve the solution. We benchmarked our approach against other heuristics using real data from our industry partner, demonstrating its superior performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features The Impact of Element Ordering on LM Agent Performance Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques Extended Deep Submodular Functions Symmetry-Enriched Learning: A Category-Theoretic Framework for Robust Machine Learning Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1