Preference learning based deep reinforcement learning for flexible job shop scheduling problem

IF 5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2025-01-15 DOI:10.1007/s40747-024-01772-x
Xinning Liu, Li Han, Ling Kang, Jiannan Liu, Huadong Miao
{"title":"Preference learning based deep reinforcement learning for flexible job shop scheduling problem","authors":"Xinning Liu, Li Han, Ling Kang, Jiannan Liu, Huadong Miao","doi":"10.1007/s40747-024-01772-x","DOIUrl":null,"url":null,"abstract":"<p>The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"74 2 Pt 2 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01772-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于偏好学习的柔性作业车间调度问题深度强化学习
柔性作业车间调度问题在理论研究和实际应用中都具有重要意义。由于FJSP的复杂性和多样性,提高调度方法的通用性和质量已成为业界和学术界关注的热点问题。为了解决这个问题,本文提出了一种基于偏好的掩码ppo (PBMP)算法,该算法利用偏好学习和无效动作掩蔽的优势来优化FJSP解决方案。首先,设计了一个基于偏好学习的奖励预测器,通过比较随机片段来建模奖励预测,从而消除了复杂的奖励函数设计的需要。其次,引入了一种新颖的智能切换机制,在采样过程中利用近端策略优化(PPO)增强探索能力,在训练过程中利用掩模近端策略优化(Mask-PPO)细化动作空间,显著提高了效率和解的质量。此外,使用Pearson相关系数(PCC)来评估偏好模型的性能。最后,在不同规模的FJSP基准实例上的对比实验表明,PBMP优于传统的调度策略,如调度规则、OR-Tools和其他深度强化学习(DRL)算法,实现了更优的调度策略和更快的收敛速度。即使实例数量不断增加,偏好学习仍然是FJSP强化学习中一种有效的奖励机制。消融研究进一步强调了PBMP算法中每个关键组件跨性能指标的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Complex & Intelligent Systems
Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
9.60
自引率
10.30%
发文量
297
期刊介绍: Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.
期刊最新文献
A low-carbon scheduling method based on improved ant colony algorithm for underground electric transportation vehicles Vehicle positioning systems in tunnel environments: a review A survey of security threats in federated learning Barriers and enhance strategies for green supply chain management using continuous linear diophantine neural networks XTNSR: Xception-based transformer network for single image super resolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1