Concertorl：有限时间单寿命增强控制的强化学习方法及其在直接驱动串联翼实验平台上的应用

IF 4.3 3区材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-10-28 DOI:10.1007/s10489-024-05720-7

Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang

{"title":"Concertorl：有限时间单寿命增强控制的强化学习方法及其在直接驱动串联翼实验平台上的应用","authors":"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang","doi":"10.1007/s10489-024-05720-7","DOIUrl":null,"url":null,"abstract":"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms\",\"authors\":\"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang\",\"doi\":\"10.1007/s10489-024-05720-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-05720-7\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05720-7","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

使用有限时间单寿命方法实现机械系统的控制，对现有控制算法的安全性和效率提出了巨大挑战。为了解决这些问题，我们引入了 ConcertoRL 算法，该算法有两个主要创新点：一个是基于 Lipschitz 条件的时间交错机制，它将经典控制器与基于强化学习的控制器整合在一起，以提高单寿命条件下初始阶段的安全性；另一个是基于有限时间 Lyapunov 收敛条件的策略构成器，它可以组织过去的学习经验，以确保在有限时间限制内提高效率。实验在直驱串联翼实验平台上进行，这是一个在非线性非稳态负载条件下运行的典型机械系统。首先，与软行为批判（SAC）算法、近端策略优化（PPO）算法和孪生延迟深度确定性策略梯度（TD3）算法等成熟算法相比，ConcertoRL 在有限时间单寿命条件下的前 500 步内表现出近一个数量级的性能优势。其次，时间交错机制的消融实验表明，引入该模块后，单次生命最后平均奖励的性能提高了近两个数量级。此外，与没有强化学习增强功能的情况相比，集成该模块后的性能大幅提升了约 60%，与以加倍控制频率运行的参考控制器相比，效率提高了 30%。这些结果凸显了该算法产生超过各部分总和的协同效应的能力。第三，对基于规则的策略构成器的消融研究进一步验证了其对提高 ConcertoRL 收敛速度的显著影响。最后，ConcertoRL 框架的通用性实验证明了它与各种经典控制器的兼容性，并不断取得优异的控制结果。ConcertoRL 为非线性、非稳定负载条件下的机械系统提供了一种有前途的方法。它可以即插即用，在有限时间和单寿命限制下实现高效控制。这项工作为直驱平台在串联机翼影响下面临的挑战设定了新的控制效果基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms

Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.