Tao Xu, Kai Xu, Jiangming Zhang, Si-qing Yang, Jun-Heng Huang
{"title":"Quality Inspection Scheduling Problem Based on Reinforcement Learning Environment","authors":"Tao Xu, Kai Xu, Jiangming Zhang, Si-qing Yang, Jun-Heng Huang","doi":"10.1109/CEEPE58418.2023.10165918","DOIUrl":null,"url":null,"abstract":"Quantity inspection plays an important role in power metering. With the development of the digital construction of the quality inspection laboratory, the scheduling algorithm of quality inspection tasks requires higher scheduling efficiency and accuracy to meet the diverse needs of practical applications. Different from the traditional job-shop scheduling problem (JSP), there is no fixed corresponding relationship between samples and tasks in the quality inspection task scheduling problem (QISP), which means a higher degree of freedom of scheduling. At the same time, quality inspection tasks have more complex constraints such as serial, parallel, and mutual exclusion, which makes the existing scheduling algorithms cannot be directly applied. This paper builds a reinforcement learning (RL) based method for QISP. A new scheduling feature representation method is proposed to fully describe the state of quality inspection tasks and sample-device utilization. Aiming to solve the problem of sparse rewards, we present a reward function to integrate scheduling environment utilization rate and empty time. Considering the non-repetitive and complex constraints of quality inspection tasks, a set of action selection rules is proposed to replace the agent's direct learning of action decisions. Heuristic decide is used to improve the convergence speed of the algorithm and enhance the interpretability of the model's action selection. Compared with the traditional MWKR, GA, PSO algorithms, the RL-based method in this paper shows great advantages in solution quality and efficiency on the real data-set of a quality inspection laboratory of a state grid corporation.","PeriodicalId":431552,"journal":{"name":"2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEPE58418.2023.10165918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Quantity inspection plays an important role in power metering. With the development of the digital construction of the quality inspection laboratory, the scheduling algorithm of quality inspection tasks requires higher scheduling efficiency and accuracy to meet the diverse needs of practical applications. Different from the traditional job-shop scheduling problem (JSP), there is no fixed corresponding relationship between samples and tasks in the quality inspection task scheduling problem (QISP), which means a higher degree of freedom of scheduling. At the same time, quality inspection tasks have more complex constraints such as serial, parallel, and mutual exclusion, which makes the existing scheduling algorithms cannot be directly applied. This paper builds a reinforcement learning (RL) based method for QISP. A new scheduling feature representation method is proposed to fully describe the state of quality inspection tasks and sample-device utilization. Aiming to solve the problem of sparse rewards, we present a reward function to integrate scheduling environment utilization rate and empty time. Considering the non-repetitive and complex constraints of quality inspection tasks, a set of action selection rules is proposed to replace the agent's direct learning of action decisions. Heuristic decide is used to improve the convergence speed of the algorithm and enhance the interpretability of the model's action selection. Compared with the traditional MWKR, GA, PSO algorithms, the RL-based method in this paper shows great advantages in solution quality and efficiency on the real data-set of a quality inspection laboratory of a state grid corporation.