Minimum-Cost State-Flipped Control for Reachability of Boolean Control Networks Using Reinforcement Learning

IF 9.4 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Cybernetics Pub Date : 2024-09-17 DOI:10.1109/TCYB.2024.3454253
Jingjie Ni;Yang Tang;Fangfei Li
{"title":"Minimum-Cost State-Flipped Control for Reachability of Boolean Control Networks Using Reinforcement Learning","authors":"Jingjie Ni;Yang Tang;Fangfei Li","doi":"10.1109/TCYB.2024.3454253","DOIUrl":null,"url":null,"abstract":"This article proposes model-free reinforcement learning methods for minimum-cost state-flipped control in Boolean control networks (BCNs). We tackle two questions: 1) finding the flipping kernel, namely, the flip set with the smallest cardinality ensuring reachability and 2) deriving optimal policies to minimize the number of flipping actions for reachability based on the obtained flipping kernel. For Question 1), Q-learning’s capability in determining reachability is demonstrated. To expedite convergence, we incorporate two improvements: 1) demonstrating that previously reachable states remain reachable after adding elements to the flip set, followed by employing transfer learning and 2) initiating each episode with special initial states whose reachability to the target state set are currently unknown. For Question 2), it is challenging to encapsulate the objective of simultaneously reducing control costs and satisfying terminal constraints exclusively through the reward function employed in the Q-learning framework. To bridge the gap, we propose a BCN-characteristics-based reward scheme and prove its optimality. Questions 1) and 2) with large-scale BCNs are addressed by employing small memory Q-learning, which reduces memory usage by only recording visited action-values. An upper bound on memory usage is provided to assess the algorithm’s feasibility. To expedite convergence for Question 2) in large-scale BCNs, we introduce adaptive variable rewards based on the known maximum steps needed to reach the target state set without cycles. Finally, the effectiveness of the proposed methods is validated on both small- and large-scale BCNs.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"54 11","pages":"7103-7115"},"PeriodicalIF":9.4000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10681440/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This article proposes model-free reinforcement learning methods for minimum-cost state-flipped control in Boolean control networks (BCNs). We tackle two questions: 1) finding the flipping kernel, namely, the flip set with the smallest cardinality ensuring reachability and 2) deriving optimal policies to minimize the number of flipping actions for reachability based on the obtained flipping kernel. For Question 1), Q-learning’s capability in determining reachability is demonstrated. To expedite convergence, we incorporate two improvements: 1) demonstrating that previously reachable states remain reachable after adding elements to the flip set, followed by employing transfer learning and 2) initiating each episode with special initial states whose reachability to the target state set are currently unknown. For Question 2), it is challenging to encapsulate the objective of simultaneously reducing control costs and satisfying terminal constraints exclusively through the reward function employed in the Q-learning framework. To bridge the gap, we propose a BCN-characteristics-based reward scheme and prove its optimality. Questions 1) and 2) with large-scale BCNs are addressed by employing small memory Q-learning, which reduces memory usage by only recording visited action-values. An upper bound on memory usage is provided to assess the algorithm’s feasibility. To expedite convergence for Question 2) in large-scale BCNs, we introduce adaptive variable rewards based on the known maximum steps needed to reach the target state set without cycles. Finally, the effectiveness of the proposed methods is validated on both small- and large-scale BCNs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用强化学习实现布尔控制网络可达性的最小成本状态翻转控制
本文提出了布尔控制网络(BCN)中最小成本状态翻转控制的无模型强化学习方法。我们解决了两个问题:1) 找到翻转内核,即确保可达性的最小卡方的翻转集;2) 根据获得的翻转内核推导出最优策略,以最小化可达性的翻转操作次数。对于问题 1),Q-learning 在确定可达性方面的能力得到了证明。为了加快收敛速度,我们做了两点改进:1) 在向翻转集添加元素后,证明之前可到达的状态仍然是可到达的,然后采用迁移学习;以及 2) 以特殊的初始状态启动每一集,这些初始状态与目标状态集的可到达性目前是未知的。对于问题 2),完全通过 Q-learning 框架中使用的奖励函数来囊括同时降低控制成本和满足终端约束的目标具有挑战性。为了弥补这一差距,我们提出了基于 BCN 特征的奖励方案,并证明了其最优性。通过采用小内存 Q-learning,我们解决了大规模 BCN 的问题 1) 和 2),Q-learning 只记录访问过的动作值,从而减少了内存使用量。我们提供了内存使用量的上限,以评估算法的可行性。为了加快问题 2) 在大规模 BCN 中的收敛速度,我们根据达到目标状态集所需的已知最大步骤,引入了自适应变量奖励。最后,我们在小型和大型 BCN 上验证了所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Cybernetics
IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
25.40
自引率
11.00%
发文量
1869
期刊介绍: The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.
期刊最新文献
Visual-Inertial-Acoustic Sensor Fusion for Accurate Autonomous Localization of Underwater Vehicles Aeroengine Bearing Time-Varying Skidding Assessment With Prior Knowledge-Embedded Dual Feedback Spatial-Temporal GCN Interval Secure Event-Triggered Mechanism for Load Frequency Control Active Defense Against DoS Attack Bayesian Transfer Filtering Using Pseudo Marginal Measurement Likelihood Granular Computing for Machine Learning: Pursuing New Development Horizons
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1