The orbital inspection game (OIG) is characterized by its strong coupling with observational conditions and higher complexity compared to pursuit-evasion games. This study employs reinforcement learning techniques to investigate the OIG problem. Building upon the Markov decision process and its hybrid variant, a two-stage decision-making model for spacecraft is proposed, decomposing the OIG into approach and inspection tasks. With a specifically designed hierarchical network architecture and gradient-driven exploration strategy, the gradient-driven exploration-based serial solution method (GDE-SSM) is developed. GDE-SSM stratifies the agent’s exploration space through model switching and task decomposition, thereby significantly improving training effectiveness. Compared to the prediction-reward-detection multi-agent deep deterministic policy gradient algorithm and the deep deterministic policy gradient algorithm, the defender trained via GDE-SSM exhibits a tendency to adopt more aggressive maneuvering strategies, resulting in average success rate improvements of 57.8% and 21.8%, respectively. Generalization and robustness analyses demonstrate that GDE-SSM achieves excellent robustness against uncertainties arising from environmental parameters, suboptimal observation conditions, and abrupt disturbances.
扫码关注我们
求助内容:
应助结果提醒方式:
