The efficiency of carrier-based aircraft support operation scheduling critically impacts aircraft carrier operational effectiveness by determining sortie generation rates, yet faces significant challenges in complex deck environments characterized by resource coupling, dynamic constraints, and high-dimensional state-action spaces. Traditional optimization algorithms and vanilla reinforcement learning (RL) struggle with computational inefficiency, sparse rewards, and adaptability to dynamic scenarios, while human expert systems are constrained by the quality of expert knowledge, and poor expert guidance may even have a negative impact. To address these limitations, this paper proposes a human experience-guided actor-critic reinforcement learning framework that synergizes domain expertise with adaptive learning. First, a dynamic Markov decision process (MDP) model is developed to rigorously simulate carrier deck operations, explicitly encoding constraints on positions, resources, and collision avoidance. Building upon this foundation, a human experience database is constructed to enable real-time pattern-matching-based intervention during agent-environment interactions, dynamically correcting wrong actions to avoid catastrophic states while refining exploration efficiency. Finally, the policy and value network objectives are reshaped to incorporate human intent through hybrid reward functions and adaptive guidance weighting, ensuring balanced integration of expert knowledge with RL's exploration capabilities. Extensive simulations across three scenarios demonstrate superior performance compared to state-of-the-art methods and maintain robustness under suboptimal human guidance. These results validate the framework's ability to harmonize human expertise with adaptive learning, offering a practical solution for real-world carriers.
扫码关注我们
求助内容:
应助结果提醒方式:
