Reinforcement learning based maintenance scheduling of flexible multi-machine manufacturing systems with varying interactive degradation

IF 11 1区工程技术 Q1 ENGINEERING, INDUSTRIAL Reliability Engineering & System Safety Pub Date : 2025-08-01 Epub Date: 2025-03-22 DOI:10.1016/j.ress.2025.111018

Jiangxi Chen, Xiaojun Zhou

{"title":"Reinforcement learning based maintenance scheduling of flexible multi-machine manufacturing systems with varying interactive degradation","authors":"Jiangxi Chen, Xiaojun Zhou","doi":"10.1016/j.ress.2025.111018","DOIUrl":null,"url":null,"abstract":"<div><div>In flexible multi-machine manufacturing systems, variations in product types dynamically influence machine loads, subsequently affecting the degradation processes of the machines. Moreover, the interactive degradation between the upstream and downstream machines, caused by the product quality deviations, changes with the different production routes for the variable product types. These factors, combined with the uncertain production schedules, present significant challenges for effective maintenance scheduling. To address these challenges, the maintenance scheduling problem is modeled as a Hidden-Mode Markov Decision Process (HM-MDP), where product types are treated as hidden modes that influence machine degradation and the subsequent maintenance decisions. The Interactive Degradation-Aware Proximal Policy Optimization (IDAPPO) reinforcement learning framework is introduced, enhancing the PPO algorithm with Graph Neural Networks (GNNs) to capture interactive degradation among machines and Long Short-Term Memory (LSTM) networks to handle temporal variations in production schedules. An entropy-based exploration strategy further manages the uncertainty of production schedules, enabling IDAPPO to adaptively optimize maintenance actions. Extensive experiments on both small-scale (5-machine) and large-scale (24-machine) systems demonstrate significantly reduced system losses and accelerated convergence of IDAPPO compared to the baseline approaches. These results indicate that IDAPPO provides a scalable and adaptive solution for improving the efficiency and reliability of complex manufacturing environments.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"260 ","pages":"Article 111018"},"PeriodicalIF":11.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832025002194","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

In flexible multi-machine manufacturing systems, variations in product types dynamically influence machine loads, subsequently affecting the degradation processes of the machines. Moreover, the interactive degradation between the upstream and downstream machines, caused by the product quality deviations, changes with the different production routes for the variable product types. These factors, combined with the uncertain production schedules, present significant challenges for effective maintenance scheduling. To address these challenges, the maintenance scheduling problem is modeled as a Hidden-Mode Markov Decision Process (HM-MDP), where product types are treated as hidden modes that influence machine degradation and the subsequent maintenance decisions. The Interactive Degradation-Aware Proximal Policy Optimization (IDAPPO) reinforcement learning framework is introduced, enhancing the PPO algorithm with Graph Neural Networks (GNNs) to capture interactive degradation among machines and Long Short-Term Memory (LSTM) networks to handle temporal variations in production schedules. An entropy-based exploration strategy further manages the uncertainty of production schedules, enabling IDAPPO to adaptively optimize maintenance actions. Extensive experiments on both small-scale (5-machine) and large-scale (24-machine) systems demonstrate significantly reduced system losses and accelerated convergence of IDAPPO compared to the baseline approaches. These results indicate that IDAPPO provides a scalable and adaptive solution for improving the efficiency and reliability of complex manufacturing environments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于强化学习的不同交互退化柔性多机制造系统维修调度

在柔性多机器制造系统中，产品类型的变化动态地影响机器负载，从而影响机器的退化过程。此外，对于不同的产品类型，由产品质量偏差引起的上下游机器之间的交互退化随着生产路线的不同而变化。这些因素，加上不确定的生产计划，对有效的维护计划提出了重大挑战。为了解决这些挑战，维护调度问题被建模为隐藏模式马尔可夫决策过程（hmm - mdp），其中产品类型被视为影响机器退化和后续维护决策的隐藏模式。引入了交互式退化感知近端策略优化（IDAPPO）强化学习框架，利用图神经网络（gnn）增强PPO算法来捕获机器之间的交互式退化，并利用长短期记忆（LSTM）网络来处理生产计划的时间变化。基于熵的勘探策略进一步管理了生产计划的不确定性，使IDAPPO能够自适应优化维护行动。在小规模（5台机器）和大规模（24台机器）系统上进行的大量实验表明，与基线方法相比，IDAPPO显著减少了系统损失并加速了收敛。这些结果表明，IDAPPO为提高复杂制造环境的效率和可靠性提供了一种可扩展和自适应的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Reliability Engineering & System Safety 管理科学-工程：工业

CiteScore

15.20

自引率

39.50%

发文量

621

审稿时长

67 days

期刊介绍： Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.