Minimum-Cost State-Flipped Control for Reachability of Boolean Control Networks Using Reinforcement Learning

IF 9.4 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Cybernetics Pub Date : 2024-09-17 DOI:10.1109/TCYB.2024.3454253

Jingjie Ni;Yang Tang;Fangfei Li

{"title":"Minimum-Cost State-Flipped Control for Reachability of Boolean Control Networks Using Reinforcement Learning","authors":"Jingjie Ni;Yang Tang;Fangfei Li","doi":"10.1109/TCYB.2024.3454253","DOIUrl":null,"url":null,"abstract":"This article proposes model-free reinforcement learning methods for minimum-cost state-flipped control in Boolean control networks (BCNs). We tackle two questions: 1) finding the flipping kernel, namely, the flip set with the smallest cardinality ensuring reachability and 2) deriving optimal policies to minimize the number of flipping actions for reachability based on the obtained flipping kernel. For Question 1), Q-learning’s capability in determining reachability is demonstrated. To expedite convergence, we incorporate two improvements: 1) demonstrating that previously reachable states remain reachable after adding elements to the flip set, followed by employing transfer learning and 2) initiating each episode with special initial states whose reachability to the target state set are currently unknown. For Question 2), it is challenging to encapsulate the objective of simultaneously reducing control costs and satisfying terminal constraints exclusively through the reward function employed in the Q-learning framework. To bridge the gap, we propose a BCN-characteristics-based reward scheme and prove its optimality. Questions 1) and 2) with large-scale BCNs are addressed by employing small memory Q-learning, which reduces memory usage by only recording visited action-values. An upper bound on memory usage is provided to assess the algorithm’s feasibility. To expedite convergence for Question 2) in large-scale BCNs, we introduce adaptive variable rewards based on the known maximum steps needed to reach the target state set without cycles. Finally, the effectiveness of the proposed methods is validated on both small- and large-scale BCNs.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 11","pages":"7103-7115"},"PeriodicalIF":9.4000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10681440/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This article proposes model-free reinforcement learning methods for minimum-cost state-flipped control in Boolean control networks (BCNs). We tackle two questions: 1) finding the flipping kernel, namely, the flip set with the smallest cardinality ensuring reachability and 2) deriving optimal policies to minimize the number of flipping actions for reachability based on the obtained flipping kernel. For Question 1), Q-learning’s capability in determining reachability is demonstrated. To expedite convergence, we incorporate two improvements: 1) demonstrating that previously reachable states remain reachable after adding elements to the flip set, followed by employing transfer learning and 2) initiating each episode with special initial states whose reachability to the target state set are currently unknown. For Question 2), it is challenging to encapsulate the objective of simultaneously reducing control costs and satisfying terminal constraints exclusively through the reward function employed in the Q-learning framework. To bridge the gap, we propose a BCN-characteristics-based reward scheme and prove its optimality. Questions 1) and 2) with large-scale BCNs are addressed by employing small memory Q-learning, which reduces memory usage by only recording visited action-values. An upper bound on memory usage is provided to assess the algorithm’s feasibility. To expedite convergence for Question 2) in large-scale BCNs, we introduce adaptive variable rewards based on the known maximum steps needed to reach the target state set without cycles. Finally, the effectiveness of the proposed methods is validated on both small- and large-scale BCNs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用强化学习实现布尔控制网络可达性的最小成本状态翻转控制

本文提出了布尔控制网络（BCN）中最小成本状态翻转控制的无模型强化学习方法。我们解决了两个问题：1) 找到翻转内核，即确保可达性的最小卡方的翻转集；2) 根据获得的翻转内核推导出最优策略，以最小化可达性的翻转操作次数。对于问题 1)，Q-learning 在确定可达性方面的能力得到了证明。为了加快收敛速度，我们做了两点改进：1) 在向翻转集添加元素后，证明之前可到达的状态仍然是可到达的，然后采用迁移学习；以及 2) 以特殊的初始状态启动每一集，这些初始状态与目标状态集的可到达性目前是未知的。对于问题 2)，完全通过 Q-learning 框架中使用的奖励函数来囊括同时降低控制成本和满足终端约束的目标具有挑战性。为了弥补这一差距，我们提出了基于 BCN 特征的奖励方案，并证明了其最优性。通过采用小内存 Q-learning，我们解决了大规模 BCN 的问题 1) 和 2)，Q-learning 只记录访问过的动作值，从而减少了内存使用量。我们提供了内存使用量的上限，以评估算法的可行性。为了加快问题 2) 在大规模 BCN 中的收敛速度，我们根据达到目标状态集所需的已知最大步骤，引入了自适应变量奖励。最后，我们在小型和大型 BCN 上验证了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

25.40

自引率

11.00%

发文量

1869

期刊介绍： The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.