Reinforcement learning control for a flapping-wing micro aerial vehicle with output constraint

IF 1.9 4区 计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS Assembly Automation Pub Date : 2022-10-27 DOI:10.1108/aa-05-2022-0140
Haifeng Huang, Xiaoyan Wu, Tingting Wang, Yongbin Sun, Qiang Fu
{"title":"Reinforcement learning control for a flapping-wing micro aerial vehicle with output constraint","authors":"Haifeng Huang, Xiaoyan Wu, Tingting Wang, Yongbin Sun, Qiang Fu","doi":"10.1108/aa-05-2022-0140","DOIUrl":null,"url":null,"abstract":"\nPurpose\nThis paper aims to study the application of reinforcement learning (RL) in the control of an output-constrained flapping-wing micro aerial vehicle (FWMAV) with system uncertainty.\n\n\nDesign/methodology/approach\nA six-degrees-of-freedom hummingbird model is used without consideration of the inertial effects of the wings. A RL algorithm based on actor–critic framework is applied, which consists of an actor network with unknown policy gradient and a critic network with unknown value function. Considering the good performance of neural network (NN) in fitting nonlinearity and its optimum characteristics, an actor–critic NN optimization algorithm is designed, in which the actor and critic NNs are used to generate a policy and approximate the cost functions, respectively. In addition, to ensure the safe and stable flight of the FWMAV, a barrier Lyapunov function is used to make the flight states constrained in predefined regions. Based on the Lyapunov stability theory, the stability of the system is analyzed, and finally, the feasibility of RL in the control of a FWMAV is verified through simulation.\n\n\nFindings\nThe proposed RL control scheme works well in ensuring the trajectory tracking of the FWMAV in the presence of output constraint and system uncertainty.\n\n\nOriginality/value\nA novel RL algorithm based on actor–critic framework is applied to the control of a FWMAV with system uncertainty. For the stable and safe flight of the FWMAV, the output constraint problem is considered and solved by barrier Lyapunov function-based control.\n","PeriodicalId":55448,"journal":{"name":"Assembly Automation","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2022-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assembly Automation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/aa-05-2022-0140","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

Purpose This paper aims to study the application of reinforcement learning (RL) in the control of an output-constrained flapping-wing micro aerial vehicle (FWMAV) with system uncertainty. Design/methodology/approach A six-degrees-of-freedom hummingbird model is used without consideration of the inertial effects of the wings. A RL algorithm based on actor–critic framework is applied, which consists of an actor network with unknown policy gradient and a critic network with unknown value function. Considering the good performance of neural network (NN) in fitting nonlinearity and its optimum characteristics, an actor–critic NN optimization algorithm is designed, in which the actor and critic NNs are used to generate a policy and approximate the cost functions, respectively. In addition, to ensure the safe and stable flight of the FWMAV, a barrier Lyapunov function is used to make the flight states constrained in predefined regions. Based on the Lyapunov stability theory, the stability of the system is analyzed, and finally, the feasibility of RL in the control of a FWMAV is verified through simulation. Findings The proposed RL control scheme works well in ensuring the trajectory tracking of the FWMAV in the presence of output constraint and system uncertainty. Originality/value A novel RL algorithm based on actor–critic framework is applied to the control of a FWMAV with system uncertainty. For the stable and safe flight of the FWMAV, the output constraint problem is considered and solved by barrier Lyapunov function-based control.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有输出约束的扑翼微型飞行器的强化学习控制
目的研究强化学习在具有系统不确定性的输出约束扑翼微型飞行器(FWMAV)控制中的应用。设计/方法/方法采用了一个不考虑翅膀惯性效应的六自由度蜂鸟模型。采用了一种基于行动者-评论家框架的强化学习算法,该算法由一个具有未知策略梯度的行动者网络和一个具有未知价值函数的评论家网络组成。考虑到神经网络在拟合非线性方面的良好性能及其最优特性,设计了一种参与者-评论家神经网络优化算法,其中参与者和评论家神经网络分别用于生成策略和近似成本函数。此外,为了保证FWMAV的安全稳定飞行,采用了障碍Lyapunov函数使飞行状态约束在预定义区域内。基于Lyapunov稳定性理论,分析了系统的稳定性,最后通过仿真验证了RL在FWMAV控制中的可行性。结果提出的RL控制方案在存在输出约束和系统不确定性的情况下能够很好地保证FWMAV的轨迹跟踪。提出了一种基于角色-评价框架的强化学习算法,并将其应用于具有系统不确定性的FWMAV控制中。为了实现FWMAV的稳定安全飞行,考虑了输出约束问题,并采用基于barrier Lyapunov函数的控制方法进行了求解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Assembly Automation
Assembly Automation 工程技术-工程:制造
CiteScore
4.30
自引率
14.30%
发文量
51
审稿时长
3.3 months
期刊介绍: Assembly Automation publishes peer reviewed research articles, technology reviews and specially commissioned case studies. Each issue includes high quality content covering all aspects of assembly technology and automation, and reflecting the most interesting and strategically important research and development activities from around the world. Because of this, readers can stay at the very forefront of industry developments. All research articles undergo rigorous double-blind peer review, and the journal’s policy of not publishing work that has only been tested in simulation means that only the very best and most practical research articles are included. This ensures that the material that is published has real relevance and value for commercial manufacturing and research organizations.
期刊最新文献
The welding tracking technology of an underwater welding robot based on sliding mode active disturbance rejection control The application of robotics and artificial intelligence in embroidery: challenges and benefits Online modeling of environmental constraint region for complex-shaped parts assembly Adaptive neural prescribed performance control for switched pure-feedback non-linear systems with input quantization Automatic tolerance analyses by generation of assembly graph and mating edges from STEP AP 242 file of mechanical assembly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1