{"title":"Reinforcement Learning as an Improvement Heuristic for Real-World Production Scheduling","authors":"Arthur Müller, Lukas Vollenkemper","doi":"arxiv-2409.11933","DOIUrl":null,"url":null,"abstract":"The integration of Reinforcement Learning (RL) with heuristic methods is an\nemerging trend for solving optimization problems, which leverages RL's ability\nto learn from the data generated during the search process. One promising\napproach is to train an RL agent as an improvement heuristic, starting with a\nsuboptimal solution that is iteratively improved by applying small changes. We\napply this approach to a real-world multiobjective production scheduling\nproblem. Our approach utilizes a network architecture that includes Transformer\nencoding to learn the relationships between jobs. Afterwards, a probability\nmatrix is generated from which pairs of jobs are sampled and then swapped to\nimprove the solution. We benchmarked our approach against other heuristics\nusing real data from our industry partner, demonstrating its superior\nperformance.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of Reinforcement Learning (RL) with heuristic methods is an
emerging trend for solving optimization problems, which leverages RL's ability
to learn from the data generated during the search process. One promising
approach is to train an RL agent as an improvement heuristic, starting with a
suboptimal solution that is iteratively improved by applying small changes. We
apply this approach to a real-world multiobjective production scheduling
problem. Our approach utilizes a network architecture that includes Transformer
encoding to learn the relationships between jobs. Afterwards, a probability
matrix is generated from which pairs of jobs are sampled and then swapped to
improve the solution. We benchmarked our approach against other heuristics
using real data from our industry partner, demonstrating its superior
performance.