A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions
Zhang Minghao , Song Bifeng , Yang Xiaojun , Wang Liang
{"title":"A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions","authors":"Zhang Minghao , Song Bifeng , Yang Xiaojun , Wang Liang","doi":"10.1016/j.engappai.2025.110373","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110373"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625003732","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.