{"title":"研究在复杂性不断增加的制造环境中,将单机和多机强化学习应用于动态调度:工业物联网测试平台综合案例研究","authors":"David Heik, Fouad Bahrpeyma, Dirk Reichelt","doi":"10.1016/j.jmsy.2024.09.019","DOIUrl":null,"url":null,"abstract":"<div><div>Industry 4.0, smart manufacturing and smart products have recently attracted substantial attention and are becoming increasingly prevalent in manufacturing systems. As a result of the successful implementation of these technologies, highly customized products can be manufactured using responsive, autonomous manufacturing processes at a competitive cost. This study was conducted at HTW Dresden’s Industrial Internet of Things Test Bed, which simulates state-of-the-art manufacturing scenarios for educational and research purposes. Apart from the physical production facility itself, the associated operational information systems have been fully interconnected in order to allow fast and efficient information exchange between the various manufacturing stages and systems. The presence of this characteristic provides a strong foundation for dealing appropriately with unexpected or planned environmental changes, as well as prevailing uncertainty, which greatly increases the overall system’s resilience. The main objective of this study is to increase the efficiency of the manufacturing system in order to optimize resource consumption and minimize the overall completion time (makespan). This manuscript discusses our experiments in the area of flexible job-shop scheduling problems (FJSP). As part of our research, different methods of representing the state space were explored, heuristic, meta-heuristic, reinforcement learning (RL), and multi-agent reinforcement learning (MARL) methods were evaluated, and various methods of interaction with the system (designing the action space and filtering in certain situations) were examined. Furthermore, the design of the reward function, which plays an important role in the formulation of the dynamic scheduling problem into an RL problem, has been discussed in depth. Finally, this paper studies the effectiveness of single-agent and multi-agent RL approaches, with a special focus on the Proximal Policy Optimization (PPO) method, on the fully-fledged digital twin of an industrial IoT system at HTW Dresden. As a result of our experiments, in a multi-agent setting involving individual agents for each manufacturing operation, PPO was able to manage the resources in such a way as to improve the manufacturing system’s performance significantly.</div></div>","PeriodicalId":16227,"journal":{"name":"Journal of Manufacturing Systems","volume":"77 ","pages":"Pages 525-557"},"PeriodicalIF":12.2000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on the application of single-agent and multi-agent reinforcement learning to dynamic scheduling in manufacturing environments with growing complexity: Case study on the synthesis of an industrial IoT Test Bed\",\"authors\":\"David Heik, Fouad Bahrpeyma, Dirk Reichelt\",\"doi\":\"10.1016/j.jmsy.2024.09.019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Industry 4.0, smart manufacturing and smart products have recently attracted substantial attention and are becoming increasingly prevalent in manufacturing systems. As a result of the successful implementation of these technologies, highly customized products can be manufactured using responsive, autonomous manufacturing processes at a competitive cost. This study was conducted at HTW Dresden’s Industrial Internet of Things Test Bed, which simulates state-of-the-art manufacturing scenarios for educational and research purposes. Apart from the physical production facility itself, the associated operational information systems have been fully interconnected in order to allow fast and efficient information exchange between the various manufacturing stages and systems. The presence of this characteristic provides a strong foundation for dealing appropriately with unexpected or planned environmental changes, as well as prevailing uncertainty, which greatly increases the overall system’s resilience. The main objective of this study is to increase the efficiency of the manufacturing system in order to optimize resource consumption and minimize the overall completion time (makespan). This manuscript discusses our experiments in the area of flexible job-shop scheduling problems (FJSP). As part of our research, different methods of representing the state space were explored, heuristic, meta-heuristic, reinforcement learning (RL), and multi-agent reinforcement learning (MARL) methods were evaluated, and various methods of interaction with the system (designing the action space and filtering in certain situations) were examined. Furthermore, the design of the reward function, which plays an important role in the formulation of the dynamic scheduling problem into an RL problem, has been discussed in depth. Finally, this paper studies the effectiveness of single-agent and multi-agent RL approaches, with a special focus on the Proximal Policy Optimization (PPO) method, on the fully-fledged digital twin of an industrial IoT system at HTW Dresden. As a result of our experiments, in a multi-agent setting involving individual agents for each manufacturing operation, PPO was able to manage the resources in such a way as to improve the manufacturing system’s performance significantly.</div></div>\",\"PeriodicalId\":16227,\"journal\":{\"name\":\"Journal of Manufacturing Systems\",\"volume\":\"77 \",\"pages\":\"Pages 525-557\"},\"PeriodicalIF\":12.2000,\"publicationDate\":\"2024-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Manufacturing Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0278612524002206\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Manufacturing Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0278612524002206","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
Study on the application of single-agent and multi-agent reinforcement learning to dynamic scheduling in manufacturing environments with growing complexity: Case study on the synthesis of an industrial IoT Test Bed
Industry 4.0, smart manufacturing and smart products have recently attracted substantial attention and are becoming increasingly prevalent in manufacturing systems. As a result of the successful implementation of these technologies, highly customized products can be manufactured using responsive, autonomous manufacturing processes at a competitive cost. This study was conducted at HTW Dresden’s Industrial Internet of Things Test Bed, which simulates state-of-the-art manufacturing scenarios for educational and research purposes. Apart from the physical production facility itself, the associated operational information systems have been fully interconnected in order to allow fast and efficient information exchange between the various manufacturing stages and systems. The presence of this characteristic provides a strong foundation for dealing appropriately with unexpected or planned environmental changes, as well as prevailing uncertainty, which greatly increases the overall system’s resilience. The main objective of this study is to increase the efficiency of the manufacturing system in order to optimize resource consumption and minimize the overall completion time (makespan). This manuscript discusses our experiments in the area of flexible job-shop scheduling problems (FJSP). As part of our research, different methods of representing the state space were explored, heuristic, meta-heuristic, reinforcement learning (RL), and multi-agent reinforcement learning (MARL) methods were evaluated, and various methods of interaction with the system (designing the action space and filtering in certain situations) were examined. Furthermore, the design of the reward function, which plays an important role in the formulation of the dynamic scheduling problem into an RL problem, has been discussed in depth. Finally, this paper studies the effectiveness of single-agent and multi-agent RL approaches, with a special focus on the Proximal Policy Optimization (PPO) method, on the fully-fledged digital twin of an industrial IoT system at HTW Dresden. As a result of our experiments, in a multi-agent setting involving individual agents for each manufacturing operation, PPO was able to manage the resources in such a way as to improve the manufacturing system’s performance significantly.
期刊介绍:
The Journal of Manufacturing Systems is dedicated to showcasing cutting-edge fundamental and applied research in manufacturing at the systems level. Encompassing products, equipment, people, information, control, and support functions, manufacturing systems play a pivotal role in the economical and competitive development, production, delivery, and total lifecycle of products, meeting market and societal needs.
With a commitment to publishing archival scholarly literature, the journal strives to advance the state of the art in manufacturing systems and foster innovation in crafting efficient, robust, and sustainable manufacturing systems. The focus extends from equipment-level considerations to the broader scope of the extended enterprise. The Journal welcomes research addressing challenges across various scales, including nano, micro, and macro-scale manufacturing, and spanning diverse sectors such as aerospace, automotive, energy, and medical device manufacturing.