Safe reinforcement learning-based control using deep deterministic policy gradient algorithm and slime mould algorithm with experimental tower crane system validation
Iuliu Alexandru Zamfirache , Radu-Emil Precup , Emil M. Petriu
{"title":"Safe reinforcement learning-based control using deep deterministic policy gradient algorithm and slime mould algorithm with experimental tower crane system validation","authors":"Iuliu Alexandru Zamfirache , Radu-Emil Precup , Emil M. Petriu","doi":"10.1016/j.ins.2024.121640","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a novel optimal control approach resulting from the combination between the safe Reinforcement Learning (RL) framework represented by a Deep Deterministic Policy Gradient (DDPG) algorithm and a Slime Mould Algorithm (SMA) as a representative nature-inspired optimization algorithm. The main drawbacks of the traditional DDPG-based safe RL optimal control approach are the possible instability of the control system caused by randomly generated initial values of the controller parameters and the lack of state safety guarantees in the first iterations of the learning process due to (i) and (ii): (i) the safety constraints are considered only in the DDPG-based training process of the controller, which is usually implemented as a neural network (NN); (ii) the initial values of the weights and the biases of the NN-based controller are initialized with randomly generated values. The proposed approach mitigates these drawbacks by initializing the parameters of the NN-based controller using SMA. The fitness function of the SMA-based initialization process is designed to incorporate state safety constraints into the search process, resulting in an initial NN-based controller with embedded state safety constraints. The proposed approach is compared to the classical one using real-time experimental results and performance indices popular for optimal reference tracking control problems and based on a state safety score.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"692 ","pages":"Article 121640"},"PeriodicalIF":8.1000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524015548","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a novel optimal control approach resulting from the combination between the safe Reinforcement Learning (RL) framework represented by a Deep Deterministic Policy Gradient (DDPG) algorithm and a Slime Mould Algorithm (SMA) as a representative nature-inspired optimization algorithm. The main drawbacks of the traditional DDPG-based safe RL optimal control approach are the possible instability of the control system caused by randomly generated initial values of the controller parameters and the lack of state safety guarantees in the first iterations of the learning process due to (i) and (ii): (i) the safety constraints are considered only in the DDPG-based training process of the controller, which is usually implemented as a neural network (NN); (ii) the initial values of the weights and the biases of the NN-based controller are initialized with randomly generated values. The proposed approach mitigates these drawbacks by initializing the parameters of the NN-based controller using SMA. The fitness function of the SMA-based initialization process is designed to incorporate state safety constraints into the search process, resulting in an initial NN-based controller with embedded state safety constraints. The proposed approach is compared to the classical one using real-time experimental results and performance indices popular for optimal reference tracking control problems and based on a state safety score.
本文介绍了一种新颖的优化控制方法,它是由以深度确定性策略梯度(DDPG)算法为代表的安全强化学习(RL)框架和以粘液模算法(SMA)为代表的自然启发优化算法相结合而产生的。基于 DDPG 的传统安全 RL 优化控制方法的主要缺点是,由于(i)和(ii)的原因,随机生成的控制器参数初始值可能导致控制系统不稳定,并且在学习过程的第一次迭代中缺乏状态安全保证:(i) 安全约束仅在基于 DDPG 的控制器训练过程中得到考虑,而控制器通常以神经网络 (NN) 的形式实现;(ii) 基于 NN 的控制器的权重和偏置初始值为随机生成值。所提出的方法通过使用 SMA 对基于 NN 的控制器参数进行初始化,缓解了这些缺点。基于 SMA 的初始化过程的适配函数旨在将状态安全约束纳入搜索过程,从而产生一个具有嵌入式状态安全约束的基于 NN 的初始控制器。利用实时实验结果和最优参考跟踪控制问题常用的性能指标,并基于状态安全评分,对所提出的方法与传统方法进行了比较。
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.