Quality Inspection Scheduling Problem Based on Reinforcement Learning Environment

Tao Xu, Kai Xu, Jiangming Zhang, Si-qing Yang, Jun-Heng Huang
{"title":"Quality Inspection Scheduling Problem Based on Reinforcement Learning Environment","authors":"Tao Xu, Kai Xu, Jiangming Zhang, Si-qing Yang, Jun-Heng Huang","doi":"10.1109/CEEPE58418.2023.10165918","DOIUrl":null,"url":null,"abstract":"Quantity inspection plays an important role in power metering. With the development of the digital construction of the quality inspection laboratory, the scheduling algorithm of quality inspection tasks requires higher scheduling efficiency and accuracy to meet the diverse needs of practical applications. Different from the traditional job-shop scheduling problem (JSP), there is no fixed corresponding relationship between samples and tasks in the quality inspection task scheduling problem (QISP), which means a higher degree of freedom of scheduling. At the same time, quality inspection tasks have more complex constraints such as serial, parallel, and mutual exclusion, which makes the existing scheduling algorithms cannot be directly applied. This paper builds a reinforcement learning (RL) based method for QISP. A new scheduling feature representation method is proposed to fully describe the state of quality inspection tasks and sample-device utilization. Aiming to solve the problem of sparse rewards, we present a reward function to integrate scheduling environment utilization rate and empty time. Considering the non-repetitive and complex constraints of quality inspection tasks, a set of action selection rules is proposed to replace the agent's direct learning of action decisions. Heuristic decide is used to improve the convergence speed of the algorithm and enhance the interpretability of the model's action selection. Compared with the traditional MWKR, GA, PSO algorithms, the RL-based method in this paper shows great advantages in solution quality and efficiency on the real data-set of a quality inspection laboratory of a state grid corporation.","PeriodicalId":431552,"journal":{"name":"2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Energy, Electrical and Power Engineering (CEEPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEEPE58418.2023.10165918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Quantity inspection plays an important role in power metering. With the development of the digital construction of the quality inspection laboratory, the scheduling algorithm of quality inspection tasks requires higher scheduling efficiency and accuracy to meet the diverse needs of practical applications. Different from the traditional job-shop scheduling problem (JSP), there is no fixed corresponding relationship between samples and tasks in the quality inspection task scheduling problem (QISP), which means a higher degree of freedom of scheduling. At the same time, quality inspection tasks have more complex constraints such as serial, parallel, and mutual exclusion, which makes the existing scheduling algorithms cannot be directly applied. This paper builds a reinforcement learning (RL) based method for QISP. A new scheduling feature representation method is proposed to fully describe the state of quality inspection tasks and sample-device utilization. Aiming to solve the problem of sparse rewards, we present a reward function to integrate scheduling environment utilization rate and empty time. Considering the non-repetitive and complex constraints of quality inspection tasks, a set of action selection rules is proposed to replace the agent's direct learning of action decisions. Heuristic decide is used to improve the convergence speed of the algorithm and enhance the interpretability of the model's action selection. Compared with the traditional MWKR, GA, PSO algorithms, the RL-based method in this paper shows great advantages in solution quality and efficiency on the real data-set of a quality inspection laboratory of a state grid corporation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于强化学习环境的质量检验调度问题
数量检验在电力计量中起着重要的作用。随着质检实验室数字化建设的发展,对质检任务调度算法提出了更高的调度效率和精度要求,以满足实际应用的多样化需求。与传统的作业车间调度问题(JSP)不同,质量检验任务调度问题(QISP)中样品与任务之间不存在固定的对应关系,这意味着调度的自由度更高。同时,质量检测任务具有串行、并行、互斥等更为复杂的约束条件,使得现有的调度算法无法直接应用。本文建立了一种基于强化学习(RL)的QISP方法。提出了一种新的调度特征表示方法,以充分描述质量检测任务的状态和样品设备的利用情况。针对稀疏奖励问题,提出了一种综合调度环境利用率和空闲时间的奖励函数。考虑到质量检测任务的非重复性和复杂约束,提出了一套动作选择规则来取代智能体对动作决策的直接学习。采用启发式决策提高了算法的收敛速度,增强了模型动作选择的可解释性。与传统的MWKR、GA、PSO算法相比,本文基于强化学习的方法在国有电网公司质量检测实验室的实际数据集上,在求解质量和效率方面具有很大的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Fault Detection Method of Infrared Image for Circulating Pump Motor in Valve Cooling System Based on Improved YOLOv3 High Renewable Penetration Development Planning under System Inertia Constraints A Robust OPF Model with Consideration of Reactive Power and Voltage Magnitude A Numerical Study on Rime Ice Accretion Characteristics for Wind Turbine Blades Research on Optimal Allocation Method of Energy Storage Devices for Coordinated Wind and Solar Power Generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1