Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali
{"title":"利用近端策略优化实现稳健的太阳帆轨迹","authors":"Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali","doi":"10.1016/j.actaastro.2024.10.065","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analysed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modelled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.</div></div>","PeriodicalId":44971,"journal":{"name":"Acta Astronautica","volume":"226 ","pages":"Pages 702-715"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust solar sail trajectories using proximal policy optimization\",\"authors\":\"Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali\",\"doi\":\"10.1016/j.actaastro.2024.10.065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analysed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modelled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.</div></div>\",\"PeriodicalId\":44971,\"journal\":{\"name\":\"Acta Astronautica\",\"volume\":\"226 \",\"pages\":\"Pages 702-715\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Astronautica\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0094576524006398\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Astronautica","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0094576524006398","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}
Robust solar sail trajectories using proximal policy optimization
Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analysed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modelled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.
期刊介绍:
Acta Astronautica is sponsored by the International Academy of Astronautics. Content is based on original contributions in all fields of basic, engineering, life and social space sciences and of space technology related to:
The peaceful scientific exploration of space,
Its exploitation for human welfare and progress,
Conception, design, development and operation of space-borne and Earth-based systems,
In addition to regular issues, the journal publishes selected proceedings of the annual International Astronautical Congress (IAC), transactions of the IAA and special issues on topics of current interest, such as microgravity, space station technology, geostationary orbits, and space economics. Other subject areas include satellite technology, space transportation and communications, space energy, power and propulsion, astrodynamics, extraterrestrial intelligence and Earth observations.