A New Optimal Design of Stable Feedback Control of Two-Wheel System Based on Reinforcement Learning

Zhenghong Yu, Xuebin Zhu
{"title":"A New Optimal Design of Stable Feedback Control of Two-Wheel System Based on Reinforcement Learning","authors":"Zhenghong Yu, Xuebin Zhu","doi":"10.4271/13-05-01-0004","DOIUrl":null,"url":null,"abstract":"The two-wheel system design is widely used in various mobile tools, such as remote-control vehicles and robots, due to its simplicity and stability. However, the specific wheel and body models in the real world can be complex, and the control accuracy of existing algorithms may not meet practical requirements. To address this issue, we propose a double inverted pendulum on mobile device (DIPM) model to improve control performances and reduce calculations. The model is based on the kinetic and potential energy of the DIPM system, known as the Euler-Lagrange equation, and is composed of three second-order nonlinear differential equations derived by specifying Lagrange. We also propose a stable feedback control method for mobile device drive systems. Our experiments compare several mainstream reinforcement learning (RL) methods, including linear quadratic regulator (LQR) and iterative linear quadratic regulator (ILQR), as well as Q-learning, SARSA, DQN (Deep Q Network), and AC. The simulation results demonstrate that the DQN and AC methods are superior to ILQR in our designed nonlinear system. In all aspects of the test, the performance of Q-learning and SARSA is comparable to that of ILQR, with some slight improvements. However, ILQR shows its advantages at 10 deg and 20 deg. In the small deflection (between 5 and 10 deg), the DQN and AC methods perform 2% better than the traditional ILQR, and in the large deflection (10–30 deg), the DQN and AC methods perform 15% better than the traditional ILQR. Overall, RL not only has the advantages of strong versatility, wide application range, and parameter customization but also greatly reduces the difficulty of control system design and human investment, making it a promising field for future research.","PeriodicalId":181105,"journal":{"name":"SAE International Journal of Sustainable Transportation, Energy, Environment, & Policy","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAE International Journal of Sustainable Transportation, Energy, Environment, & Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4271/13-05-01-0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The two-wheel system design is widely used in various mobile tools, such as remote-control vehicles and robots, due to its simplicity and stability. However, the specific wheel and body models in the real world can be complex, and the control accuracy of existing algorithms may not meet practical requirements. To address this issue, we propose a double inverted pendulum on mobile device (DIPM) model to improve control performances and reduce calculations. The model is based on the kinetic and potential energy of the DIPM system, known as the Euler-Lagrange equation, and is composed of three second-order nonlinear differential equations derived by specifying Lagrange. We also propose a stable feedback control method for mobile device drive systems. Our experiments compare several mainstream reinforcement learning (RL) methods, including linear quadratic regulator (LQR) and iterative linear quadratic regulator (ILQR), as well as Q-learning, SARSA, DQN (Deep Q Network), and AC. The simulation results demonstrate that the DQN and AC methods are superior to ILQR in our designed nonlinear system. In all aspects of the test, the performance of Q-learning and SARSA is comparable to that of ILQR, with some slight improvements. However, ILQR shows its advantages at 10 deg and 20 deg. In the small deflection (between 5 and 10 deg), the DQN and AC methods perform 2% better than the traditional ILQR, and in the large deflection (10–30 deg), the DQN and AC methods perform 15% better than the traditional ILQR. Overall, RL not only has the advantages of strong versatility, wide application range, and parameter customization but also greatly reduces the difficulty of control system design and human investment, making it a promising field for future research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于强化学习的两轮系统稳定反馈控制新优化设计
两轮系统设计由于其简单和稳定,被广泛应用于各种移动工具,如遥控车辆和机器人。然而,现实世界中具体的车轮和车身模型可能很复杂,现有算法的控制精度可能无法满足实际要求。为了解决这个问题,我们提出了一个移动设备上的双倒立摆(DIPM)模型,以提高控制性能并减少计算量。该模型基于DIPM系统的动能和势能,称为欧拉-拉格朗日方程,由三个指定拉格朗日导出的二阶非线性微分方程组成。针对移动设备驱动系统,提出了一种稳定的反馈控制方法。我们的实验比较了几种主流的强化学习(RL)方法,包括线性二次调节器(LQR)和迭代线性二次调节器(ILQR),以及Q-learning、SARSA、DQN (Deep Q Network)和AC。仿真结果表明,在我们设计的非线性系统中,DQN和AC方法优于ILQR。在测试的各个方面,Q-learning和SARSA的性能与ILQR相当,并有轻微的改进。然而,在10°和20°时,ILQR显示出其优势。在小挠度(5 ~ 10°)时,DQN和AC方法比传统ILQR性能提高2%,在大挠度(10 ~ 30°)时,DQN和AC方法比传统ILQR性能提高15%。总体而言,强化学习不仅具有通用性强、适用范围广、参数可定制等优点,而且大大降低了控制系统的设计难度和人力投入,是未来研究的一个很有前景的领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Multi-Target Urban Transportation Structure Model under the Optimal Capacity Limitation of Road Networks Exploration of a Precise Traffic Restriction Policy on Urban River-Crossing Corridors: A Case Study in Changsha, China Precision Enhancement in Tough Polylactic Acid Material Extrusion: A Systematic Response Surface Investigation for Sustainable Manufacturing Exploration of the Impact of Built Environment Factors on Morning and Evening Peak Ridership at Urban Rail Transit Stations: A Case Study of Changsha, China Optimizing Power Consumption in Machining Nickel-Based Superalloys: Strategies for Energy Efficiency
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1