{"title":"用于地面目标跟踪的强化学习单智能体的设计、选择和评估","authors":"Hannah Lehman, John Valasek","doi":"10.2514/1.i011284","DOIUrl":null,"url":null,"abstract":"Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.","PeriodicalId":50260,"journal":{"name":"Journal of Aerospace Information Systems","volume":"14 6","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design, Selection, and Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking\",\"authors\":\"Hannah Lehman, John Valasek\",\"doi\":\"10.2514/1.i011284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.\",\"PeriodicalId\":50260,\"journal\":{\"name\":\"Journal of Aerospace Information Systems\",\"volume\":\"14 6\",\"pages\":\"0\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Aerospace Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2514/1.i011284\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Aerospace Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/1.i011284","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}
Design, Selection, and Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking
Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground target tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle, such as constant airspeed and constant altitude, because the number of states and actions was necessarily limited. Thus, extensive tuning was required to obtain good tracking performance. The expansion from 4 state–action degrees of freedom to 15 enabled the agent to exploit previous reward functions that produced novel yet undesirable emergent behavior. This paper investigates the causes of and various potential solutions to undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provides insight into obtaining stable tracking results. Consideration is given to reward structure selection and refinement to mitigate undesirable emergent behavior. Results presented in the paper for a simulated environment of a single unmanned air system tracking a randomly moving single ground target show that a soft actor–critic algorithm can produce feasible tracking trajectories without limiting the state space and action space, provided that the environment is properly posed.
期刊介绍:
This Journal is devoted to the dissemination of original archival research papers describing new theoretical developments, novel applications, and case studies regarding advances in aerospace computing, information, and networks and communication systems that address aerospace-specific issues. Issues related to signal processing, electromagnetics, antenna theory, and the basic networking hardware transmission technologies of a network are not within the scope of this journal. Topics include aerospace systems and software engineering; verification and validation of embedded systems; the field known as ‘big data,’ data analytics, machine learning, and knowledge management for aerospace systems; human-automation interaction and systems health management for aerospace systems. Applications of autonomous systems, systems engineering principles, and safety and mission assurance are of particular interest. The Journal also features Technical Notes that discuss particular technical innovations or applications in the topics described above. Papers are also sought that rigorously review the results of recent research developments. In addition to original research papers and reviews, the journal publishes articles that review books, conferences, social media, and new educational modes applicable to the scope of the Journal.