{"title":"Two-stage deep reinforcement learning method for agile optical satellite scheduling problem","authors":"Zheng Liu, Wei Xiong, Zhuoya Jia, Chi Han","doi":"10.1007/s40747-024-01667-x","DOIUrl":null,"url":null,"abstract":"<p>This paper investigates the agile optical satellite scheduling problem, which aims to arrange an observation sequence and observation actions for observation tasks. Existing research mainly aims to maximize the number of completed tasks or the total priorities of the completed tasks but ignores the influence of the observation actions on the imaging quality. Besides, the conventional exact methods and heuristic methods can hardly obtain a high-quality solution in a short time due to the complicated constraints and considerable solution space of this problem. Thus, this paper proposes a two-stage scheduling framework with two-stage deep reinforcement learning to address this problem. First, the scheduling process is decomposed into a task sequencing stage and an observation scheduling stage, and a mathematical model with complex constraints and two-stage optimization objectives is established to describe the problem. Then, a pointer network with a local selection mechanism and a rough pruning mechanism is constructed as the sequencing network to generate an executable task sequence in the task sequencing stage. Next, a decomposition strategy decomposes the executable task sequence into multiple sub-sequences in the observation scheduling stage, and the observation scheduling process of these sub-sequences is modeled as a concatenated Markov decision process. A neural network is designed as the observation scheduling network to determine observation actions for the sequenced tasks, which is well trained by the soft actor-critic algorithm. Finally, extensive experiments show that the proposed method, along with the designed mechanisms and strategy, is superior to comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01667-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates the agile optical satellite scheduling problem, which aims to arrange an observation sequence and observation actions for observation tasks. Existing research mainly aims to maximize the number of completed tasks or the total priorities of the completed tasks but ignores the influence of the observation actions on the imaging quality. Besides, the conventional exact methods and heuristic methods can hardly obtain a high-quality solution in a short time due to the complicated constraints and considerable solution space of this problem. Thus, this paper proposes a two-stage scheduling framework with two-stage deep reinforcement learning to address this problem. First, the scheduling process is decomposed into a task sequencing stage and an observation scheduling stage, and a mathematical model with complex constraints and two-stage optimization objectives is established to describe the problem. Then, a pointer network with a local selection mechanism and a rough pruning mechanism is constructed as the sequencing network to generate an executable task sequence in the task sequencing stage. Next, a decomposition strategy decomposes the executable task sequence into multiple sub-sequences in the observation scheduling stage, and the observation scheduling process of these sub-sequences is modeled as a concatenated Markov decision process. A neural network is designed as the observation scheduling network to determine observation actions for the sequenced tasks, which is well trained by the soft actor-critic algorithm. Finally, extensive experiments show that the proposed method, along with the designed mechanisms and strategy, is superior to comparison algorithms in terms of solution quality, generalization performance, and computation efficiency.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.