用于地面目标跟踪的强化学习单智能体的设计、选择和评估

IF 1.3 4区工程技术 Q2 ENGINEERING, AEROSPACE Journal of Aerospace Information Systems Pub Date : 2023-11-14 DOI:10.2514/1.i011284

Hannah Lehman, John Valasek

{"title":"用于地面目标跟踪的强化学习单智能体的设计、选择和评估","authors":"Hannah Lehman, John Valasek","doi":"10.2514/1.i011284","DOIUrl":null,"url":null,"abstract":"Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.","PeriodicalId":50260,"journal":{"name":"Journal of Aerospace Information Systems","volume":"14 6","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design, Selection, and Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking\",\"authors\":\"Hannah Lehman, John Valasek\",\"doi\":\"10.2514/1.i011284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.\",\"PeriodicalId\":50260,\"journal\":{\"name\":\"Journal of Aerospace Information Systems\",\"volume\":\"14 6\",\"pages\":\"0\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Aerospace Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/1.i011284\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Aerospace Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/1.i011284","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

摘要

先前的小型固定翼无人机系统使用标准和深度强化学习算法实现了令人满意的地面目标跟踪性能，该系统携带的是捷联式而不是平衡式摄像机。然而，这些方法对飞行器的动力学有明显的限制和抽象，比如恒定空速和恒定高度，因为状态和动作的数量必然是有限的。因此，需要进行大量调优以获得良好的跟踪性能。从4个状态-行动自由度扩展到15个自由度，使代理能够利用之前产生新颖但不受欢迎的紧急行为的奖励函数。本文研究了地面目标跟踪问题中产生不良紧急行为的原因和各种可能的解决方法。环境变化、奖励结构、动作空间简化、命令率和控制器实现的组合为获得稳定的跟踪结果提供了洞察力。考虑奖励结构的选择和优化，以减轻不良的突发行为。本文对单个无人机系统跟踪随机移动的单个地面目标的模拟环境进行了研究，结果表明，只要环境设定得当，软行为者评价算法可以在不限制状态空间和动作空间的情况下产生可行的跟踪轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Design, Selection, and Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking

Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Aerospace Information Systems ENGINEERING, AEROSPACE-

CiteScore

3.70

自引率

13.30%

发文量

审稿时长

>12 weeks

期刊介绍： This Journal is devoted to the dissemination of original archival research papers describing new theoretical developments, novel applications, and case studies regarding advances in aerospace computing, information, and networks and communication systems that address aerospace-specific issues. Issues related to signal processing, electromagnetics, antenna theory, and the basic networking hardware transmission technologies of a network are not within the scope of this journal. Topics include aerospace systems and software engineering; verification and validation of embedded systems; the field known as ‘big data,’ data analytics, machine learning, and knowledge management for aerospace systems; human-automation interaction and systems health management for aerospace systems. Applications of autonomous systems, systems engineering principles, and safety and mission assurance are of particular interest. The Journal also features Technical Notes that discuss particular technical innovations or applications in the topics described above. Papers are also sought that rigorously review the results of recent research developments. In addition to original research papers and reviews, the journal publishes articles that review books, conferences, social media, and new educational modes applicable to the scope of the Journal.