Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang
{"title":"Concertorl:有限时间单寿命增强控制的强化学习方法及其在直接驱动串联翼实验平台上的应用","authors":"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang","doi":"10.1007/s10489-024-05720-7","DOIUrl":null,"url":null,"abstract":"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms\",\"authors\":\"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang\",\"doi\":\"10.1007/s10489-024-05720-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-05720-7\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-05720-7","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms
Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.