基于深度强化学习的地月空间准周期轨道自主制导

IF 1.3 4区工程技术 Q2 ENGINEERING, AEROSPACE Journal of Spacecraft and Rockets Pub Date : 2023-08-25 DOI:10.2514/1.a35747

Lorenzo Federici, A. Scorsoglio, Alessandro Zavoli, R. Furfaro

{"title":"基于深度强化学习的地月空间准周期轨道自主制导","authors":"Lorenzo Federici, A. Scorsoglio, Alessandro Zavoli, R. Furfaro","doi":"10.2514/1.a35747","DOIUrl":null,"url":null,"abstract":"This paper investigates the use of reinforcement learning for the fuel-optimal guidance of a spacecraft during a time-free low-thrust transfer between two libration point orbits in the cislunar environment. To this aim, a deep neural network is trained via proximal policy optimization to map any spacecraft state to the optimal control action. A general-purpose reward is used to guide the network toward a fuel-optimal control law, regardless of the specific pair of libration orbits considered and without the use of any ad hoc reward shaping technique. Eventually, the learned control policies are compared with the optimal solutions provided by a direct method in two different mission scenarios, and Monte Carlo simulations are used to assess the policies’ robustness to navigation uncertainties.","PeriodicalId":50048,"journal":{"name":"Journal of Spacecraft and Rockets","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autonomous Guidance Between Quasiperiodic Orbits in Cislunar Space via Deep Reinforcement Learning\",\"authors\":\"Lorenzo Federici, A. Scorsoglio, Alessandro Zavoli, R. Furfaro\",\"doi\":\"10.2514/1.a35747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates the use of reinforcement learning for the fuel-optimal guidance of a spacecraft during a time-free low-thrust transfer between two libration point orbits in the cislunar environment. To this aim, a deep neural network is trained via proximal policy optimization to map any spacecraft state to the optimal control action. A general-purpose reward is used to guide the network toward a fuel-optimal control law, regardless of the specific pair of libration orbits considered and without the use of any ad hoc reward shaping technique. Eventually, the learned control policies are compared with the optimal solutions provided by a direct method in two different mission scenarios, and Monte Carlo simulations are used to assess the policies’ robustness to navigation uncertainties.\",\"PeriodicalId\":50048,\"journal\":{\"name\":\"Journal of Spacecraft and Rockets\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2023-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Spacecraft and Rockets\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.2514/1.a35747\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Spacecraft and Rockets","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.2514/1.a35747","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了在地月环境下，航天器在无时低推力轨道间转移时的燃料最优制导问题。为此，通过近端策略优化训练深度神经网络，将航天器的任何状态映射到最优控制动作。使用通用奖励来引导网络走向燃料最优控制律，而不考虑特定的振动轨道对，也不使用任何特殊的奖励塑造技术。最后，在两种不同的任务场景下，将学习到的控制策略与直接方法提供的最优解进行比较，并利用蒙特卡罗仿真来评估策略对导航不确定性的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Autonomous Guidance Between Quasiperiodic Orbits in Cislunar Space via Deep Reinforcement Learning

This paper investigates the use of reinforcement learning for the fuel-optimal guidance of a spacecraft during a time-free low-thrust transfer between two libration point orbits in the cislunar environment. To this aim, a deep neural network is trained via proximal policy optimization to map any spacecraft state to the optimal control action. A general-purpose reward is used to guide the network toward a fuel-optimal control law, regardless of the specific pair of libration orbits considered and without the use of any ad hoc reward shaping technique. Eventually, the learned control policies are compared with the optimal solutions provided by a direct method in two different mission scenarios, and Monte Carlo simulations are used to assess the policies’ robustness to navigation uncertainties.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Spacecraft and Rockets 工程技术-工程：宇航

CiteScore

3.60

自引率

18.80%

发文量

185

审稿时长

4.5 months

期刊介绍： This Journal, that started it all back in 1963, is devoted to the advancement of the science and technology of astronautics and aeronautics through the dissemination of original archival research papers disclosing new theoretical developments and/or experimental result. The topics include aeroacoustics, aerodynamics, combustion, fundamentals of propulsion, fluid mechanics and reacting flows, fundamental aspects of the aerospace environment, hydrodynamics, lasers and associated phenomena, plasmas, research instrumentation and facilities, structural mechanics and materials, optimization, and thermomechanics and thermochemistry. Papers also are sought which review in an intensive manner the results of recent research developments on any of the topics listed above.