铁路列车实时调度的多任务深度强化学习方法

Tao Tang, Simin Chai, Wei Wu, Jiateng Yin, Andrea D’Ariano
{"title":"铁路列车实时调度的多任务深度强化学习方法","authors":"Tao Tang, Simin Chai, Wei Wu, Jiateng Yin, Andrea D’Ariano","doi":"10.1016/j.tre.2024.103900","DOIUrl":null,"url":null,"abstract":"In high-speed railway systems, unexpected disruptions can result in delays of trains, significantly affecting the quality of service for passengers. Train Timetable Rescheduling (TTR) is a crucial task in the daily operation of high-speed railways to maintain punctuality and efficiency in the face of such unforeseen disruptions. Most existing studies on TTR are based on integer programming (IP) techniques and are required to solve IP models repetitively in case of disruptions, which however may be very time-consuming and greatly limit their usefulness in practice. Our study first proposes a multi-task deep reinforcement learning (MDRL) approach for TTR. Our MDRL is constructed and trained offline with a large number of historical disruptive events, enabling to generate TTR decisions in real-time for different disruption cases. Specifically, we transform the TTR problem into a Markov decision process considering the retiming and rerouting of trains. Then, we construct the MDRL framework with the definition of state, action, transition, reward, and value function approximations with neural networks for each agent (i.e., rail train), by considering the information of different disruption events as tasks. To overcome the low training efficiency and huge memory usage in the training of MDRL, given a large number of disruptive events in the historical data, we develop a new and high-efficient training method based on a Quadratic assignment programming (QAP) model and a Frank-Wolfe-based algorithm. Our QAP model optimizes only a small number but most “representative” tasks from the historical data, while the Frank-Wolfe-based algorithm approximates the nonlinear terms in the value function of MDRL and updates the model parameters among different training tasks concurrently. Finally, based on the real-world data from the Beijing–Zhangjiakou high-speed railway systems, we evaluate the performance of our MDRL approach by benchmarking it against state-of-the-art approaches in the literature. Our computational results demonstrate that an offline-trained MDRL is able to generate near-optimal TTR solutions in real-time against different disruption scenarios, and it evidently outperforms state-of-art models regarding solution quality and computational time.","PeriodicalId":49418,"journal":{"name":"Transportation Research Part E-Logistics and Transportation Review","volume":"50 1","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multi-task deep reinforcement learning approach to real-time railway train rescheduling\",\"authors\":\"Tao Tang, Simin Chai, Wei Wu, Jiateng Yin, Andrea D’Ariano\",\"doi\":\"10.1016/j.tre.2024.103900\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In high-speed railway systems, unexpected disruptions can result in delays of trains, significantly affecting the quality of service for passengers. Train Timetable Rescheduling (TTR) is a crucial task in the daily operation of high-speed railways to maintain punctuality and efficiency in the face of such unforeseen disruptions. Most existing studies on TTR are based on integer programming (IP) techniques and are required to solve IP models repetitively in case of disruptions, which however may be very time-consuming and greatly limit their usefulness in practice. Our study first proposes a multi-task deep reinforcement learning (MDRL) approach for TTR. Our MDRL is constructed and trained offline with a large number of historical disruptive events, enabling to generate TTR decisions in real-time for different disruption cases. Specifically, we transform the TTR problem into a Markov decision process considering the retiming and rerouting of trains. Then, we construct the MDRL framework with the definition of state, action, transition, reward, and value function approximations with neural networks for each agent (i.e., rail train), by considering the information of different disruption events as tasks. To overcome the low training efficiency and huge memory usage in the training of MDRL, given a large number of disruptive events in the historical data, we develop a new and high-efficient training method based on a Quadratic assignment programming (QAP) model and a Frank-Wolfe-based algorithm. Our QAP model optimizes only a small number but most “representative” tasks from the historical data, while the Frank-Wolfe-based algorithm approximates the nonlinear terms in the value function of MDRL and updates the model parameters among different training tasks concurrently. Finally, based on the real-world data from the Beijing–Zhangjiakou high-speed railway systems, we evaluate the performance of our MDRL approach by benchmarking it against state-of-the-art approaches in the literature. Our computational results demonstrate that an offline-trained MDRL is able to generate near-optimal TTR solutions in real-time against different disruption scenarios, and it evidently outperforms state-of-art models regarding solution quality and computational time.\",\"PeriodicalId\":49418,\"journal\":{\"name\":\"Transportation Research Part E-Logistics and Transportation Review\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part E-Logistics and Transportation Review\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.tre.2024.103900\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part E-Logistics and Transportation Review","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.tre.2024.103900","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

摘要

在高速铁路系统中,意外中断可能导致列车延误,严重影响乘客的服务质量。列车时刻表调整(TTR)是高速铁路日常运营中的一项关键任务,在面对不可预见的中断时保持准时和效率。现有的TTR研究大多是基于整数规划(IP)技术,并且需要在中断的情况下重复求解IP模型,然而这可能非常耗时并且极大地限制了它们在实践中的实用性。我们的研究首先提出了TTR的多任务深度强化学习(MDRL)方法。我们的MDRL是用大量历史中断事件离线构建和训练的,能够针对不同的中断情况实时生成TTR决策。具体来说,我们将TTR问题转化为考虑列车重定时和改道的马尔可夫决策过程。然后,我们将不同中断事件的信息作为任务,对每个智能体(即轨道列车)定义状态、动作、转移、奖励和价值函数逼近,并使用神经网络构建了MDRL框架。针对MDRL训练中训练效率低、内存占用大的问题,在历史数据中存在大量破坏性事件的情况下,提出了一种基于二次分配规划(QAP)模型和基于frank - wolfe算法的高效训练方法。我们的QAP模型只从历史数据中优化了少数但最具“代表性”的任务,而基于frank - wolfe的算法则近似MDRL值函数中的非线性项,并在不同训练任务之间并行更新模型参数。最后,基于北京-张家口高速铁路系统的真实数据,我们通过对比文献中最先进的方法来评估我们的MDRL方法的性能。我们的计算结果表明,离线训练的MDRL能够针对不同的中断场景实时生成接近最优的TTR解决方案,并且在解决方案质量和计算时间方面明显优于最先进的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A multi-task deep reinforcement learning approach to real-time railway train rescheduling
In high-speed railway systems, unexpected disruptions can result in delays of trains, significantly affecting the quality of service for passengers. Train Timetable Rescheduling (TTR) is a crucial task in the daily operation of high-speed railways to maintain punctuality and efficiency in the face of such unforeseen disruptions. Most existing studies on TTR are based on integer programming (IP) techniques and are required to solve IP models repetitively in case of disruptions, which however may be very time-consuming and greatly limit their usefulness in practice. Our study first proposes a multi-task deep reinforcement learning (MDRL) approach for TTR. Our MDRL is constructed and trained offline with a large number of historical disruptive events, enabling to generate TTR decisions in real-time for different disruption cases. Specifically, we transform the TTR problem into a Markov decision process considering the retiming and rerouting of trains. Then, we construct the MDRL framework with the definition of state, action, transition, reward, and value function approximations with neural networks for each agent (i.e., rail train), by considering the information of different disruption events as tasks. To overcome the low training efficiency and huge memory usage in the training of MDRL, given a large number of disruptive events in the historical data, we develop a new and high-efficient training method based on a Quadratic assignment programming (QAP) model and a Frank-Wolfe-based algorithm. Our QAP model optimizes only a small number but most “representative” tasks from the historical data, while the Frank-Wolfe-based algorithm approximates the nonlinear terms in the value function of MDRL and updates the model parameters among different training tasks concurrently. Finally, based on the real-world data from the Beijing–Zhangjiakou high-speed railway systems, we evaluate the performance of our MDRL approach by benchmarking it against state-of-the-art approaches in the literature. Our computational results demonstrate that an offline-trained MDRL is able to generate near-optimal TTR solutions in real-time against different disruption scenarios, and it evidently outperforms state-of-art models regarding solution quality and computational time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
16.20
自引率
16.00%
发文量
285
审稿时长
62 days
期刊介绍: Transportation Research Part E: Logistics and Transportation Review is a reputable journal that publishes high-quality articles covering a wide range of topics in the field of logistics and transportation research. The journal welcomes submissions on various subjects, including transport economics, transport infrastructure and investment appraisal, evaluation of public policies related to transportation, empirical and analytical studies of logistics management practices and performance, logistics and operations models, and logistics and supply chain management. Part E aims to provide informative and well-researched articles that contribute to the understanding and advancement of the field. The content of the journal is complementary to other prestigious journals in transportation research, such as Transportation Research Part A: Policy and Practice, Part B: Methodological, Part C: Emerging Technologies, Part D: Transport and Environment, and Part F: Traffic Psychology and Behaviour. Together, these journals form a comprehensive and cohesive reference for current research in transportation science.
期刊最新文献
Designing hub-based regional transportation networks with service level constraints Enhancing carsharing pricing and operations through integrated choice models PTB: A deep reinforcement learning method for flexible logistics service combination problem with spatial-temporal constraint Should live streaming be adopted for agricultural supply chain considering platform’s quality improvement and blockchain support? Returns management in a supply chain considering freight insurance and consumer disappointment aversion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1