Application of an Off-Policy Reinforcement Learning Algorithm for H ∞ ${{H}_\infty }$ Control Design of Nonlinear Structural Systems With Completely Unknown Dynamics

IF 5 2区工程技术 Q1 ENGINEERING, CIVIL Earthquake Engineering & Structural Dynamics Pub Date : 2025-01-13 DOI:10.1002/eqe.4299

M. Amirmojahedi, A. Mojoodi, Saeed Shojaee, Saleh Hamzehei-Javaran

{"title":"Application of an Off-Policy Reinforcement Learning Algorithm for \n \n \n H\n ∞\n \n ${{H}_\\infty }$\n Control Design of Nonlinear Structural Systems With Completely Unknown Dynamics","authors":"M. Amirmojahedi, A. Mojoodi, Saeed Shojaee, Saleh Hamzehei-Javaran","doi":"10.1002/eqe.4299","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This paper proposes a model-free and online off-policy algorithm based on reinforcement learning (RL) for vibration attenuation of earthquake-excited structures, through designing an optimal <span></span><math>\n <semantics>\n <msub>\n <mi>H</mi>\n <mi>∞</mi>\n </msub>\n <annotation>${{H}_\\infty }$</annotation>\n </semantics></math> controller. This design relies on solving a two-player zero-sum game theory with a Hamilton–Jacobi–Isaacs (HJI) equation, which is extremely difficult, or often impossible, to be solved for the value function and the related optimal controller. The proposed strategy uses an actor-critic-disturbance structure to learn the solution of the HJI equation online and forward in time, without requiring any knowledge of the system dynamics. In addition, the control and disturbance policies and value function are approximated by the actor, the disturbance, and the critic neural networks (NNs), respectively.</p>\n <p>Implementing the policy iteration technique, the NNs’ weights of the proposed model are calculated using the least square (LS) method in each iteration. In the present study, the convergence of the proposed algorithm is investigated through two distinct examples. Furthermore, the performance of this off-policy RL strategy is studied in reducing the response of a seismically excited nonlinear structure with an active mass damper (AMD) for two cases of state feedback. The simulation results prove the effectiveness of the proposed algorithm in application to civil engineering structures.</p>\n </div>","PeriodicalId":11390,"journal":{"name":"Earthquake Engineering & Structural Dynamics","volume":"54 4","pages":"1210-1228"},"PeriodicalIF":5.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earthquake Engineering & Structural Dynamics","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eqe.4299","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a model-free and online off-policy algorithm based on reinforcement learning (RL) for vibration attenuation of earthquake-excited structures, through designing an optimal $H_{\infty}$ controller. This design relies on solving a two-player zero-sum game theory with a Hamilton–Jacobi–Isaacs (HJI) equation, which is extremely difficult, or often impossible, to be solved for the value function and the related optimal controller. The proposed strategy uses an actor-critic-disturbance structure to learn the solution of the HJI equation online and forward in time, without requiring any knowledge of the system dynamics. In addition, the control and disturbance policies and value function are approximated by the actor, the disturbance, and the critic neural networks (NNs), respectively.

Implementing the policy iteration technique, the NNs’ weights of the proposed model are calculated using the least square (LS) method in each iteration. In the present study, the convergence of the proposed algorithm is investigated through two distinct examples. Furthermore, the performance of this off-policy RL strategy is studied in reducing the response of a seismically excited nonlinear structure with an active mass damper (AMD) for two cases of state feedback. The simulation results prove the effectiveness of the proposed algorithm in application to civil engineering structures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非策略强化学习算法在H∞${{H}_\infty }$完全未知非线性结构系统控制设计中的应用

本文通过设计一个最优 H ∞ ${{H}_\infty }$ 控制器，提出了一种基于强化学习（RL）的无模型在线非策略算法，用于地震激励结构的振动衰减。这种设计依赖于求解带有汉密尔顿-雅各比-艾萨克（HJI）方程的双人零和博弈理论，而这种博弈理论对于价值函数和相关最优控制器的求解是极其困难的，甚至往往是不可能的。所提出的策略利用行为者-批评者-扰动结构来在线学习 HJI 方程的解，并在时间上向前推进，而不需要任何系统动态知识。此外，控制和扰动策略以及值函数分别由行动者、扰动和批评者神经网络（NN）近似。采用策略迭代技术，在每次迭代中使用最小二乘法（LS）计算拟议模型的神经网络权重。在本研究中，通过两个不同的实例研究了所提算法的收敛性。此外，还研究了这种非策略 RL 策略在两种状态反馈情况下降低带主动质量阻尼器 (AMD) 的地震激励非线性结构响应的性能。仿真结果证明了所提算法在土木工程结构应用中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Earthquake Engineering & Structural Dynamics 工程技术-工程：地质

CiteScore

7.20

自引率

13.30%

发文量

180

审稿时长

4.8 months

期刊介绍： Earthquake Engineering and Structural Dynamics provides a forum for the publication of papers on several aspects of engineering related to earthquakes. The problems in this field, and their solutions, are international in character and require knowledge of several traditional disciplines; the Journal will reflect this. Papers that may be relevant but do not emphasize earthquake engineering and related structural dynamics are not suitable for the Journal. Relevant topics include the following: ground motions for analysis and design geotechnical earthquake engineering probabilistic and deterministic methods of dynamic analysis experimental behaviour of structures seismic protective systems system identification risk assessment seismic code requirements methods for earthquake-resistant design and retrofit of structures.