Huimin Zhang;Li Huang;Wanling Huang;Lei Feng;Xianxian Li
{"title":"Synthesis of Opacity-Enforcing Supervisory Strategies Using Reinforcement Learning","authors":"Huimin Zhang;Li Huang;Wanling Huang;Lei Feng;Xianxian Li","doi":"10.1109/TASE.2024.3456239","DOIUrl":null,"url":null,"abstract":"In the control of discrete-event systems for current-state opacity enforcement, it is difficult to synthesize a supervisor by supervisory control theory (SCT) without explicit formal models of the systems. This study utilizes the reinforcement learning (RL) method to obtain supervisory policies for opacity enforcement in the case when the automaton model of the system is unavailable. The state space of the environment in the RL is dynamically generated through system simulation. Actions are defined according to the control patterns of the SCT. A reward function is proposed to evaluate whether the secret is exposed or not. Then, a sequence of state-action-reward chains are obtained as system simulation goes on. The frameworks of Q-learning and State-Action-Reward-State-Action (SARSA) algorithms are adopted to implement the proposed approach. The goal of the training is to maximize the total accumulative reward by optimizing the action selection in the learning process. Then, an optimal supervisory policy is obtained when the training process converges. Experiments are performed to illustrate the effectiveness of the proposed approach. The contributions are two aspects. Firstly, a supervisor for opacity enforcement is learned by RL training without an explicit formal model of the system. Secondly, the ability of the proposed method in computing supervisory policies without formal models addresses a significant gap in the literature and offers a new direction for research in opacity enforcement in discrete event systems. Note to Practitioners—Supervisory Control Theory (SCT) supplies an effective way to synthesize supervisors, which traditionally handles tasks with explicit system models for current-state opacity enforcement by restricting behavior of systems. However, formal models of systems are often confidential or otherwise unavailable. This paper presents a method for supervisor synthesis via reinforcement learning in the case of lacking formal models of systems. The proposed method leverages the characteristics of control patterns in SCT and optimizes the action selection in the training process through a reward mechanism that evaluates the secrecy of states. The approach can be applied to model-free RL frameworks such as Q-learning and State-Action-Reward-State-Action (SARSA) algorithms. The training is performed as the system simulation goes on. When the training process converges, the optimal policy can be used to enforce opacity for the system. However, Q-table is used to save the Q-value in both Q-learning and SARSA algorithms. In the worst case, the size of the Q-table grows exponentially with the number of states and controllable events increasing. This can lead to memory exhaustion when the system’s scale is large. To make the approach scalable, we will attempt to use DRL to train control policies for opacity enforcement in the future.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"6896-6906"},"PeriodicalIF":6.4000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10681647/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In the control of discrete-event systems for current-state opacity enforcement, it is difficult to synthesize a supervisor by supervisory control theory (SCT) without explicit formal models of the systems. This study utilizes the reinforcement learning (RL) method to obtain supervisory policies for opacity enforcement in the case when the automaton model of the system is unavailable. The state space of the environment in the RL is dynamically generated through system simulation. Actions are defined according to the control patterns of the SCT. A reward function is proposed to evaluate whether the secret is exposed or not. Then, a sequence of state-action-reward chains are obtained as system simulation goes on. The frameworks of Q-learning and State-Action-Reward-State-Action (SARSA) algorithms are adopted to implement the proposed approach. The goal of the training is to maximize the total accumulative reward by optimizing the action selection in the learning process. Then, an optimal supervisory policy is obtained when the training process converges. Experiments are performed to illustrate the effectiveness of the proposed approach. The contributions are two aspects. Firstly, a supervisor for opacity enforcement is learned by RL training without an explicit formal model of the system. Secondly, the ability of the proposed method in computing supervisory policies without formal models addresses a significant gap in the literature and offers a new direction for research in opacity enforcement in discrete event systems. Note to Practitioners—Supervisory Control Theory (SCT) supplies an effective way to synthesize supervisors, which traditionally handles tasks with explicit system models for current-state opacity enforcement by restricting behavior of systems. However, formal models of systems are often confidential or otherwise unavailable. This paper presents a method for supervisor synthesis via reinforcement learning in the case of lacking formal models of systems. The proposed method leverages the characteristics of control patterns in SCT and optimizes the action selection in the training process through a reward mechanism that evaluates the secrecy of states. The approach can be applied to model-free RL frameworks such as Q-learning and State-Action-Reward-State-Action (SARSA) algorithms. The training is performed as the system simulation goes on. When the training process converges, the optimal policy can be used to enforce opacity for the system. However, Q-table is used to save the Q-value in both Q-learning and SARSA algorithms. In the worst case, the size of the Q-table grows exponentially with the number of states and controllable events increasing. This can lead to memory exhaustion when the system’s scale is large. To make the approach scalable, we will attempt to use DRL to train control policies for opacity enforcement in the future.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.