Xuan Lin, Zhicheng Liang, Tiesheng Yan, Taiqiang Cao, Hua Cheng, Jian Mao, Rui Deng
{"title":"货运列车速度轨迹优化的q -学习","authors":"Xuan Lin, Zhicheng Liang, Tiesheng Yan, Taiqiang Cao, Hua Cheng, Jian Mao, Rui Deng","doi":"10.1117/12.2652584","DOIUrl":null,"url":null,"abstract":"The train speed trajectory optimization (TSTO) aims at finding the optimal speed trajectory considering the safety, energy efficiency, punctuality and stopping accuracy. From the perspective of mitigating the greenhouse effect, it’s quite significant to study the TSTO problem. This paper proposed an optimization algorithm based on Reinforcement Learning (RL). Firstly, a global optimization model using RL was established. In the model, the control sequence including the control regimes and their switching points was taken as the state. The optimization objectives were taken as the reward function. The adjustment of the position of the switching points in the control sequence was taken as the decision space of the agent. Secondly, an adjustment method of the control sequence based on the deep Q-learning and embedding matrix was proposed. The training data was sampled using the experience replay. The optimal control sequence was obtained through the iterative training of the neural network. Finally, the optimization algorithm based on RL was compared with the driving strategies based on the Pontryagin’s Maximum Principle (PMP) and the field test data. The results show that the energy consumption of the proposed algorithm is reduced by 0.16% in comparison with that of the PMP, which proves that the proposed method can be applied to the multi-objective optimization of the train operation. Comparing with the field test data, the energy consumption of the optimization algorithm is reduced by 4.89%, which demonstrates that the proposed method can be used to guide the drivers to drive the freight train energy-efficiently.","PeriodicalId":116712,"journal":{"name":"Frontiers of Traffic and Transportation Engineering","volume":"331 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Q-learning for the speed trajectory optimization of the freight train\",\"authors\":\"Xuan Lin, Zhicheng Liang, Tiesheng Yan, Taiqiang Cao, Hua Cheng, Jian Mao, Rui Deng\",\"doi\":\"10.1117/12.2652584\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The train speed trajectory optimization (TSTO) aims at finding the optimal speed trajectory considering the safety, energy efficiency, punctuality and stopping accuracy. From the perspective of mitigating the greenhouse effect, it’s quite significant to study the TSTO problem. This paper proposed an optimization algorithm based on Reinforcement Learning (RL). Firstly, a global optimization model using RL was established. In the model, the control sequence including the control regimes and their switching points was taken as the state. The optimization objectives were taken as the reward function. The adjustment of the position of the switching points in the control sequence was taken as the decision space of the agent. Secondly, an adjustment method of the control sequence based on the deep Q-learning and embedding matrix was proposed. The training data was sampled using the experience replay. The optimal control sequence was obtained through the iterative training of the neural network. Finally, the optimization algorithm based on RL was compared with the driving strategies based on the Pontryagin’s Maximum Principle (PMP) and the field test data. The results show that the energy consumption of the proposed algorithm is reduced by 0.16% in comparison with that of the PMP, which proves that the proposed method can be applied to the multi-objective optimization of the train operation. Comparing with the field test data, the energy consumption of the optimization algorithm is reduced by 4.89%, which demonstrates that the proposed method can be used to guide the drivers to drive the freight train energy-efficiently.\",\"PeriodicalId\":116712,\"journal\":{\"name\":\"Frontiers of Traffic and Transportation Engineering\",\"volume\":\"331 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers of Traffic and Transportation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2652584\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Traffic and Transportation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2652584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Q-learning for the speed trajectory optimization of the freight train
The train speed trajectory optimization (TSTO) aims at finding the optimal speed trajectory considering the safety, energy efficiency, punctuality and stopping accuracy. From the perspective of mitigating the greenhouse effect, it’s quite significant to study the TSTO problem. This paper proposed an optimization algorithm based on Reinforcement Learning (RL). Firstly, a global optimization model using RL was established. In the model, the control sequence including the control regimes and their switching points was taken as the state. The optimization objectives were taken as the reward function. The adjustment of the position of the switching points in the control sequence was taken as the decision space of the agent. Secondly, an adjustment method of the control sequence based on the deep Q-learning and embedding matrix was proposed. The training data was sampled using the experience replay. The optimal control sequence was obtained through the iterative training of the neural network. Finally, the optimization algorithm based on RL was compared with the driving strategies based on the Pontryagin’s Maximum Principle (PMP) and the field test data. The results show that the energy consumption of the proposed algorithm is reduced by 0.16% in comparison with that of the PMP, which proves that the proposed method can be applied to the multi-objective optimization of the train operation. Comparing with the field test data, the energy consumption of the optimization algorithm is reduced by 4.89%, which demonstrates that the proposed method can be used to guide the drivers to drive the freight train energy-efficiently.