A Deep Reinforcement Learning Method Based on a Transformer Model for the Flexible Job Shop Scheduling Problem

IF 2.6 3区工程技术 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Electronics Pub Date : 2024-09-18 DOI:10.3390/electronics13183696

Shuai Xu, Yanwu Li, Qiuyang Li

{"title":"A Deep Reinforcement Learning Method Based on a Transformer Model for the Flexible Job Shop Scheduling Problem","authors":"Shuai Xu, Yanwu Li, Qiuyang Li","doi":"10.3390/electronics13183696","DOIUrl":null,"url":null,"abstract":"The flexible job shop scheduling problem (FJSSP), which can significantly enhance production efficiency, is a mathematical optimization problem widely applied in modern manufacturing industries. However, due to its NP-hard nature, finding an optimal solution for all scenarios within a reasonable time frame faces serious challenges. This paper proposes a solution that transforms the FJSSP into a Markov Decision Process (MDP) and employs deep reinforcement learning (DRL) techniques for resolution. First, we represent the state features of the scheduling environment using seven feature vectors and utilize a transformer encoder as a feature extraction module to effectively capture the relationships between state features and enhance representation capability. Second, based on the features of the jobs and machines, we design 16 composite dispatching rules from multiple dimensions, including the job completion rate, processing time, waiting time, and manufacturing resource utilization, to achieve flexible and efficient scheduling decisions. Furthermore, we project an intuitive and dense reward function with the objective of minimizing the total idle time of machines. Finally, to verify the performance and feasibility of the algorithm, we evaluate the proposed policy model on the Brandimarte, Hurink, and Dauzere datasets. Our experimental results demonstrate that the proposed framework consistently outperforms traditional dispatching rules, surpasses metaheuristic methods on larger-scale instances, and exceeds the performance of existing DRL-based scheduling methods across most datasets.","PeriodicalId":11646,"journal":{"name":"Electronics","volume":"2 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/electronics13183696","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The flexible job shop scheduling problem (FJSSP), which can significantly enhance production efficiency, is a mathematical optimization problem widely applied in modern manufacturing industries. However, due to its NP-hard nature, finding an optimal solution for all scenarios within a reasonable time frame faces serious challenges. This paper proposes a solution that transforms the FJSSP into a Markov Decision Process (MDP) and employs deep reinforcement learning (DRL) techniques for resolution. First, we represent the state features of the scheduling environment using seven feature vectors and utilize a transformer encoder as a feature extraction module to effectively capture the relationships between state features and enhance representation capability. Second, based on the features of the jobs and machines, we design 16 composite dispatching rules from multiple dimensions, including the job completion rate, processing time, waiting time, and manufacturing resource utilization, to achieve flexible and efficient scheduling decisions. Furthermore, we project an intuitive and dense reward function with the objective of minimizing the total idle time of machines. Finally, to verify the performance and feasibility of the algorithm, we evaluate the proposed policy model on the Brandimarte, Hurink, and Dauzere datasets. Our experimental results demonstrate that the proposed framework consistently outperforms traditional dispatching rules, surpasses metaheuristic methods on larger-scale instances, and exceeds the performance of existing DRL-based scheduling methods across most datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于变压器模型的深度强化学习法，用于灵活的作业车间调度问题

灵活作业车间调度问题（FJSSP）能显著提高生产效率，是现代制造业广泛应用的数学优化问题。然而，由于 FJSSP 具有 NP 难的性质，要在合理的时间内找到适用于所有情况的最优解，面临着严峻的挑战。本文提出了一种解决方案，将 FJSSP 转化为马尔可夫决策过程（MDP），并采用深度强化学习（DRL）技术加以解决。首先，我们用七个特征向量表示调度环境的状态特征，并利用变换器编码器作为特征提取模块，有效捕捉状态特征之间的关系，增强表示能力。其次，根据作业和机器的特征，从作业完成率、处理时间、等待时间、制造资源利用率等多个维度设计出 16 条复合调度规则，实现灵活高效的调度决策。此外，我们还以最小化机器总闲置时间为目标，预测了一个直观且密集的奖励函数。最后，为了验证算法的性能和可行性，我们在 Brandimarte、Hurink 和 Dauzere 数据集上对提出的策略模型进行了评估。实验结果表明，在大多数数据集上，提议的框架始终优于传统的调度规则，在更大规模的实例上超越了元启发式方法，并超过了现有的基于 DRL 的调度方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Electronics Computer Science-Computer Networks and Communications

CiteScore

1.10

自引率

10.30%

发文量

3515

审稿时长

16.71 days

期刊介绍： Electronics (ISSN 2079-9292; CODEN: ELECGJ) is an international, open access journal on the science of electronics and its applications published quarterly online by MDPI.