This study proposes a deep reinforcement learning-based optimization framework for integrated train scheduling and rolling stock circulation planning under dynamic passenger demand. The problem is formulated as a Markov decision process (MDP) with a hybrid action space that simultaneously captures continuous timetable decisions and discrete rolling stock allocations. The objective is to minimize passenger waiting time and operator costs while adhering to complex operational constraints. To address the challenge of simultaneously coordinating continuous and discrete decision variables in a high-dimensional operational context, we adopt a Hybrid Proximal Policy Optimization (HPPO) algorithm, incorporating separate actor networks for discrete and continuous actions, and employing constraint-handling techniques such as action masking and action space embedding. Furthermore, a potential-based reward shaping function is introduced to enhance learning efficiency by addressing issues of sparse and delayed rewards. The proposed approach is validated on the Beijing Metro Changping Line. Experimental results demonstrate that the HPPO algorithm effectively improves system efficiency and policy robustness.
扫码关注我们
求助内容:
应助结果提醒方式:
