{"title":"考虑详细奖励函数和两阶段学习协议的倒立摆平衡控制器设计","authors":"Xiaochen Liu, Sipeng Wang, Xingxing Li, Ze Cui","doi":"10.3390/sym16091227","DOIUrl":null,"url":null,"abstract":"As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.","PeriodicalId":501198,"journal":{"name":"Symmetry","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol\",\"authors\":\"Xiaochen Liu, Sipeng Wang, Xingxing Li, Ze Cui\",\"doi\":\"10.3390/sym16091227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.\",\"PeriodicalId\":501198,\"journal\":{\"name\":\"Symmetry\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symmetry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/sym16091227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symmetry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/sym16091227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
作为一种复杂的非线性系统,倒立摆(IP)系统具有非对称性和不稳定性的特点。本文采用学习型深度神经网络(DNN)控制倒立摆系统,以端到端方式将系统状态直接映射为控制指令。在深度强化学习(DRL)的基础上,设计了细节奖励函数(DRF)来指导 DNN 学习控制策略,大大增强了控制的针对性和灵活性。此外,还提出了两阶段学习协议(离线学习阶段和在线学习阶段)来解决 IP 系统的 "真实差距 "问题。首先,DNN 基于简化的 IP 动态模型和 DRF 学习离线控制策略。然后,在 IP 平台上设计并使用安全控制器,对 DNN 进行在线优化。实验结果表明,在平台上进行二次学习后,DNN 对模型误差具有良好的鲁棒性。当摆锤长度减少 25% 或增加 25% 时,摆锤角度的稳态误差小于 0.05 rad。误差在允许范围内。DNN 对摆长的变化具有鲁棒性。DRF 和两阶段学习协议提高了控制器对实际平台复杂多变特性的适应性,为其他基于学习的机器人控制问题提供了参考。
Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol
As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.