Synthesis of Opacity-Enforcing Supervisory Strategies Using Reinforcement Learning

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-09-17 DOI:10.1109/TASE.2024.3456239

Huimin Zhang;Li Huang;Wanling Huang;Lei Feng;Xianxian Li

{"title":"Synthesis of Opacity-Enforcing Supervisory Strategies Using Reinforcement Learning","authors":"Huimin Zhang;Li Huang;Wanling Huang;Lei Feng;Xianxian Li","doi":"10.1109/TASE.2024.3456239","DOIUrl":null,"url":null,"abstract":"In the control of discrete-event systems for current-state opacity enforcement, it is difficult to synthesize a supervisor by supervisory control theory (SCT) without explicit formal models of the systems. This study utilizes the reinforcement learning (RL) method to obtain supervisory policies for opacity enforcement in the case when the automaton model of the system is unavailable. The state space of the environment in the RL is dynamically generated through system simulation. Actions are defined according to the control patterns of the SCT. A reward function is proposed to evaluate whether the secret is exposed or not. Then, a sequence of state-action-reward chains are obtained as system simulation goes on. The frameworks of Q-learning and State-Action-Reward-State-Action (SARSA) algorithms are adopted to implement the proposed approach. The goal of the training is to maximize the total accumulative reward by optimizing the action selection in the learning process. Then, an optimal supervisory policy is obtained when the training process converges. Experiments are performed to illustrate the effectiveness of the proposed approach. The contributions are two aspects. Firstly, a supervisor for opacity enforcement is learned by RL training without an explicit formal model of the system. Secondly, the ability of the proposed method in computing supervisory policies without formal models addresses a significant gap in the literature and offers a new direction for research in opacity enforcement in discrete event systems. Note to Practitioners—Supervisory Control Theory (SCT) supplies an effective way to synthesize supervisors, which traditionally handles tasks with explicit system models for current-state opacity enforcement by restricting behavior of systems. However, formal models of systems are often confidential or otherwise unavailable. This paper presents a method for supervisor synthesis via reinforcement learning in the case of lacking formal models of systems. The proposed method leverages the characteristics of control patterns in SCT and optimizes the action selection in the training process through a reward mechanism that evaluates the secrecy of states. The approach can be applied to model-free RL frameworks such as Q-learning and State-Action-Reward-State-Action (SARSA) algorithms. The training is performed as the system simulation goes on. When the training process converges, the optimal policy can be used to enforce opacity for the system. However, Q-table is used to save the Q-value in both Q-learning and SARSA algorithms. In the worst case, the size of the Q-table grows exponentially with the number of states and controllable events increasing. This can lead to memory exhaustion when the system’s scale is large. To make the approach scalable, we will attempt to use DRL to train control policies for opacity enforcement in the future.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"6896-6906"},"PeriodicalIF":6.4000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10681647/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the control of discrete-event systems for current-state opacity enforcement, it is difficult to synthesize a supervisor by supervisory control theory (SCT) without explicit formal models of the systems. This study utilizes the reinforcement learning (RL) method to obtain supervisory policies for opacity enforcement in the case when the automaton model of the system is unavailable. The state space of the environment in the RL is dynamically generated through system simulation. Actions are defined according to the control patterns of the SCT. A reward function is proposed to evaluate whether the secret is exposed or not. Then, a sequence of state-action-reward chains are obtained as system simulation goes on. The frameworks of Q-learning and State-Action-Reward-State-Action (SARSA) algorithms are adopted to implement the proposed approach. The goal of the training is to maximize the total accumulative reward by optimizing the action selection in the learning process. Then, an optimal supervisory policy is obtained when the training process converges. Experiments are performed to illustrate the effectiveness of the proposed approach. The contributions are two aspects. Firstly, a supervisor for opacity enforcement is learned by RL training without an explicit formal model of the system. Secondly, the ability of the proposed method in computing supervisory policies without formal models addresses a significant gap in the literature and offers a new direction for research in opacity enforcement in discrete event systems. Note to Practitioners—Supervisory Control Theory (SCT) supplies an effective way to synthesize supervisors, which traditionally handles tasks with explicit system models for current-state opacity enforcement by restricting behavior of systems. However, formal models of systems are often confidential or otherwise unavailable. This paper presents a method for supervisor synthesis via reinforcement learning in the case of lacking formal models of systems. The proposed method leverages the characteristics of control patterns in SCT and optimizes the action selection in the training process through a reward mechanism that evaluates the secrecy of states. The approach can be applied to model-free RL frameworks such as Q-learning and State-Action-Reward-State-Action (SARSA) algorithms. The training is performed as the system simulation goes on. When the training process converges, the optimal policy can be used to enforce opacity for the system. However, Q-table is used to save the Q-value in both Q-learning and SARSA algorithms. In the worst case, the size of the Q-table grows exponentially with the number of states and controllable events increasing. This can lead to memory exhaustion when the system’s scale is large. To make the approach scalable, we will attempt to use DRL to train control policies for opacity enforcement in the future.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用强化学习合成不透明强化监督策略

在离散事件系统的控制中，如果没有系统的明确的形式化模型，很难用监督控制理论（SCT）来合成监督器。本研究利用强化学习（RL）方法，在系统自动机模型不可用的情况下，获得不透明执行的监督策略。通过系统仿真，动态生成RL中环境的状态空间。动作是根据SCT的控制模式定义的。提出了一个奖励函数来评估秘密是否被暴露。然后，随着系统仿真的进行，得到一系列状态-行为-奖励链。采用Q-learning和状态-动作-奖励-状态-动作（SARSA）算法框架实现该方法。训练的目标是通过优化学习过程中的动作选择，使累计总奖励最大化。然后，当训练过程收敛时，得到最优的监督策略。实验证明了该方法的有效性。贡献有两个方面。首先，通过强化学习训练来学习不透明执行的监督者，而不需要明确的系统形式模型。其次，所提出的方法在没有正式模型的情况下计算监督政策的能力解决了文献中的一个重大空白，并为离散事件系统中不透明执行的研究提供了一个新的方向。从业人员注意-监督控制理论（SCT）提供了一种有效的方法来综合监督者，传统上，监督者通过限制系统的行为来处理具有显式系统模型的当前状态不透明性执行任务。然而，系统的正式模型通常是保密的或不可用的。本文提出了一种在缺乏系统正式模型的情况下，通过强化学习进行监督器综合的方法。该方法利用SCT控制模式的特点，通过评估状态保密性的奖励机制优化训练过程中的动作选择。该方法可以应用于无模型强化学习框架，如Q-learning和状态-动作-奖励-状态-动作（SARSA）算法。随着系统仿真的进行，训练也随之进行。当训练过程收敛时，可以使用最优策略来增强系统的不透明性。而在Q-learning算法和SARSA算法中，Q-table都是用来保存q值的。在最坏的情况下，q表的大小随着状态和可控事件数量的增加而呈指数增长。当系统规模较大时，这可能导致内存耗尽。为了使该方法具有可扩展性，我们将尝试在未来使用DRL来训练不透明执行的控制策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.