强化学习作为改进现实世界生产调度的启发式方法

arXiv - CS - Machine Learning Pub Date : 2024-09-18 DOI:arxiv-2409.11933

Arthur Müller, Lukas Vollenkemper

{"title":"强化学习作为改进现实世界生产调度的启发式方法","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":null,"url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\nemerging trend for solving optimization problems, which leverages RL's ability\nto learn from the data generated during the search process. One promising\napproach is to train an RL agent as an improvement heuristic, starting with a\nsuboptimal solution that is iteratively improved by applying small changes. We\napply this approach to a real-world multiobjective production scheduling\nproblem. Our approach utilizes a network architecture that includes Transformer\nencoding to learn the relationships between jobs. Afterwards, a probability\nmatrix is generated from which pairs of jobs are sampled and then swapped to\nimprove the solution. We benchmarked our approach against other heuristics\nusing real data from our industry partner, demonstrating its superior\nperformance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling\",\"authors\":\"Arthur Müller, Lukas Vollenkemper\",\"doi\":\"arxiv-2409.11933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of Reinforcement Learning (RL) with heuristic methods is an\\nemerging trend for solving optimization problems, which leverages RL's ability\\nto learn from the data generated during the search process. One promising\\napproach is to train an RL agent as an improvement heuristic, starting with a\\nsuboptimal solution that is iteratively improved by applying small changes. We\\napply this approach to a real-world multiobjective production scheduling\\nproblem. Our approach utilizes a network architecture that includes Transformer\\nencoding to learn the relationships between jobs. Afterwards, a probability\\nmatrix is generated from which pairs of jobs are sampled and then swapped to\\nimprove the solution. We benchmarked our approach against other heuristics\\nusing real data from our industry partner, demonstrating its superior\\nperformance.\",\"PeriodicalId\":501301,\"journal\":{\"name\":\"arXiv - CS - Machine Learning\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11933\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

强化学习（RL）与启发式方法的结合是解决优化问题的一个新兴趋势，它充分利用了 RL 从搜索过程中产生的数据中学习的能力。一种很有前景的方法是将 RL 代理作为改进启发式方法来训练，从次优解开始，通过应用微小的变化进行迭代改进。我们将这种方法应用于现实世界中的多目标生产调度问题。我们的方法采用了一种网络架构，其中包括转换编码（Transformerencoding）来学习工作之间的关系。之后，生成一个概率矩阵，从中抽取成对的工作，然后进行交换，以改进解决方案。我们利用行业合作伙伴提供的真实数据，将我们的方法与其他启发式方法进行了对比，证明了它的优越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling

The integration of Reinforcement Learning (RL) with heuristic methods is an emerging trend for solving optimization problems, which leverages RL's ability to learn from the data generated during the search process. One promising approach is to train an RL agent as an improvement heuristic, starting with a suboptimal solution that is iteratively improved by applying small changes. We apply this approach to a real-world multiobjective production scheduling problem. Our approach utilizes a network architecture that includes Transformer encoding to learn the relationships between jobs. Afterwards, a probability matrix is generated from which pairs of jobs are sampled and then swapped to improve the solution. We benchmarked our approach against other heuristics using real data from our industry partner, demonstrating its superior performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Machine Learning

自引率

0.00%

发文量