Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems
Qionghua Liao, Guilong Li, Jiajie Yu, Ziyuan Gu, Wei Ma
{"title":"Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems","authors":"Qionghua Liao, Guilong Li, Jiajie Yu, Ziyuan Gu, Wei Ma","doi":"arxiv-2407.20679","DOIUrl":null,"url":null,"abstract":"With the proliferation of electric vehicles (EVs), the transportation network\nand power grid become increasingly interdependent and coupled via charging\nstations. The concomitant growth in charging demand has posed challenges for\nboth networks, highlighting the importance of charging coordination. Existing\nliterature largely overlooks the interactions between power grid security and\ntraffic efficiency. In view of this, we study the en-route charging station\n(CS) recommendation problem for EVs in dynamically coupled transportation-power\nsystems. The system-level objective is to maximize the overall traffic\nefficiency while ensuring the safety of the power grid. This problem is for the\nfirst time formulated as a constrained Markov decision process (CMDP), and an\nonline prediction-assisted safe reinforcement learning (OP-SRL) method is\nproposed to learn the optimal and secure policy by extending the PPO method. To\nbe specific, we mainly address two challenges. First, the constrained\noptimization problem is converted into an equivalent unconstrained optimization\nproblem by applying the Lagrangian method. Second, to account for the uncertain\nlong-time delay between performing CS recommendation and commencing charging,\nwe put forward an online sequence-to-sequence (Seq2Seq) predictor for state\naugmentation to guide the agent in making forward-thinking decisions. Finally,\nwe conduct comprehensive experimental studies based on the Nguyen-Dupuis\nnetwork and a large-scale real-world road network, coupled with IEEE 33-bus and\nIEEE 69-bus distribution systems, respectively. Results demonstrate that the\nproposed method outperforms baselines in terms of road network efficiency,\npower grid safety, and EV user satisfaction. The case study on the real-world\nnetwork also illustrates the applicability in the practical context.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the proliferation of electric vehicles (EVs), the transportation network
and power grid become increasingly interdependent and coupled via charging
stations. The concomitant growth in charging demand has posed challenges for
both networks, highlighting the importance of charging coordination. Existing
literature largely overlooks the interactions between power grid security and
traffic efficiency. In view of this, we study the en-route charging station
(CS) recommendation problem for EVs in dynamically coupled transportation-power
systems. The system-level objective is to maximize the overall traffic
efficiency while ensuring the safety of the power grid. This problem is for the
first time formulated as a constrained Markov decision process (CMDP), and an
online prediction-assisted safe reinforcement learning (OP-SRL) method is
proposed to learn the optimal and secure policy by extending the PPO method. To
be specific, we mainly address two challenges. First, the constrained
optimization problem is converted into an equivalent unconstrained optimization
problem by applying the Lagrangian method. Second, to account for the uncertain
long-time delay between performing CS recommendation and commencing charging,
we put forward an online sequence-to-sequence (Seq2Seq) predictor for state
augmentation to guide the agent in making forward-thinking decisions. Finally,
we conduct comprehensive experimental studies based on the Nguyen-Dupuis
network and a large-scale real-world road network, coupled with IEEE 33-bus and
IEEE 69-bus distribution systems, respectively. Results demonstrate that the
proposed method outperforms baselines in terms of road network efficiency,
power grid safety, and EV user satisfaction. The case study on the real-world
network also illustrates the applicability in the practical context.