用于地面目标跟踪的强化学习单智能体的设计、选择和评估

IF 1.3 4区 工程技术 Q2 ENGINEERING, AEROSPACE Journal of Aerospace Information Systems Pub Date : 2023-11-14 DOI:10.2514/1.i011284
Hannah Lehman, John Valasek
{"title":"用于地面目标跟踪的强化学习单智能体的设计、选择和评估","authors":"Hannah Lehman, John Valasek","doi":"10.2514/1.i011284","DOIUrl":null,"url":null,"abstract":"Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.","PeriodicalId":50260,"journal":{"name":"Journal of Aerospace Information Systems","volume":"14 6","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design, Selection, and Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking\",\"authors\":\"Hannah Lehman, John Valasek\",\"doi\":\"10.2514/1.i011284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.\",\"PeriodicalId\":50260,\"journal\":{\"name\":\"Journal of Aerospace Information Systems\",\"volume\":\"14 6\",\"pages\":\"0\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Aerospace Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/1.i011284\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Aerospace Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/1.i011284","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}
引用次数: 0

摘要

先前的小型固定翼无人机系统使用标准和深度强化学习算法实现了令人满意的地面目标跟踪性能,该系统携带的是捷联式而不是平衡式摄像机。然而,这些方法对飞行器的动力学有明显的限制和抽象,比如恒定空速和恒定高度,因为状态和动作的数量必然是有限的。因此,需要进行大量调优以获得良好的跟踪性能。从4个状态-行动自由度扩展到15个自由度,使代理能够利用之前产生新颖但不受欢迎的紧急行为的奖励函数。本文研究了地面目标跟踪问题中产生不良紧急行为的原因和各种可能的解决方法。环境变化、奖励结构、动作空间简化、命令率和控制器实现的组合为获得稳定的跟踪结果提供了洞察力。考虑奖励结构的选择和优化,以减轻不良的突发行为。本文对单个无人机系统跟踪随机移动的单个地面目标的模拟环境进行了研究,结果表明,只要环境设定得当,软行为者评价算法可以在不限制状态空间和动作空间的情况下产生可行的跟踪轨迹。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Design, Selection, and Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking
Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.70
自引率
13.30%
发文量
58
审稿时长
>12 weeks
期刊介绍: This Journal is devoted to the dissemination of original archival research papers describing new theoretical developments, novel applications, and case studies regarding advances in aerospace computing, information, and networks and communication systems that address aerospace-specific issues. Issues related to signal processing, electromagnetics, antenna theory, and the basic networking hardware transmission technologies of a network are not within the scope of this journal. Topics include aerospace systems and software engineering; verification and validation of embedded systems; the field known as ‘big data,’ data analytics, machine learning, and knowledge management for aerospace systems; human-automation interaction and systems health management for aerospace systems. Applications of autonomous systems, systems engineering principles, and safety and mission assurance are of particular interest. The Journal also features Technical Notes that discuss particular technical innovations or applications in the topics described above. Papers are also sought that rigorously review the results of recent research developments. In addition to original research papers and reviews, the journal publishes articles that review books, conferences, social media, and new educational modes applicable to the scope of the Journal.
期刊最新文献
New Type-2-Fuzzy-Logic-Based Control System for the Cessna Citation X Basic Engagement Zones Advanced Wavelet Transform-Based Automated System for Drone State Identification Using Radio-Frequency Signal Integration of the Functional Hazard Assessment Within a Model-Based Systems Engineering Framework Safe Spacecraft Inspection via Deep Reinforcement Learning and Discrete Control Barrier Functions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1