Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems

IF 15.3 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Ieee-Caa Journal of Automatica Sinica Pub Date : 2024-10-08 DOI:10.1109/JAS.2024.124818

Min Yang;Guanjun Liu;Ziyuan Zhou;Jiacun Wang

{"title":"Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems","authors":"Min Yang;Guanjun Liu;Ziyuan Zhou;Jiacun Wang","doi":"10.1109/JAS.2024.124818","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management. However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata, which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications. First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units (PDMUs), and a reverse breadth-first search (BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"11 11","pages":"2327-2339"},"PeriodicalIF":15.3000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10707648/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Deep reinforcement learning (DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management. However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata, which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications. First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units (PDMUs), and a reverse breadth-first search (BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于概率自动机的深度强化学习系统性能提升方法

深度强化学习（DRL）已在车间调度和能源系统管理等工业制造领域展现出巨大潜力。然而，由于模型本身的不确定性，要将其应用于实际任务，就必须进行严格的验证。特定的测试可能会暴露出预训练 DRL 模型性能的不足，而 DRL 的 "黑箱 "性质又给模型行为测试带来了挑战。我们提出了一种基于概率自动机的新型性能改进框架，旨在主动识别和纠正 DRL 系统的关键漏洞，从而以最小的模型修改提高 DRL 模型在实际任务中的性能。首先，通过抽象状态生成概率决策单元（PDMU），根据 DRL 系统的历史轨迹构建概率自动机，然后使用反向广度优先搜索（BFS）方法识别对不利结果影响最大的关键 PDMU-行动对。这一过程只依赖于每个轨迹的状态-行动序列和最终结果。然后，在关键 PDMU 下，我们搜索对有利结果影响最大的新行动。最后，将关键 PDMU、不良行动和新行动封装为监控器，通过实时监控和修正机制，引导 DRL 系统获得更有利的结果。在两个标准强化学习环境和三个实际工作调度场景中进行的评估证实了该方法的有效性，为 DRL 模型在实际应用中的部署提供了一定的保障。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.

期刊最新文献

Cas-FNE: Cascaded Face Normal Estimation Front cover Inside back cover Inside front cover Back cover