This paper proposes a Transformer-Enhanced Multi-Agent Reinforcement Learning (TE-MARL) framework for electric logistics fleet routing under time-varying traffic and coupled power-grid loads, with the goal of minimizing overall operating cost. First, we develop a linear congestion model with a 30-min smooth transition to approximate the continuous dynamics of traffic flows. In parallel, we establish an energy consumption–charging coupling model that captures the effects of congestion, speed, specific energy consumption, and peak–valley grid-load fluctuations on charging efficiency. Next, we design a three-module Transformer-based policy network that incorporates traffic-aware attention to strengthen the representation of congestion features. The framework employs centralized training with decentralized execution (CTDE) and Proximal Policy Optimization (PPO) to enhance generalization and stability in highly dynamic environments. Beyond conventional objectives such as minimizing travel time and energy consumption, we introduce a Traffic Adaptability Cost metric to quantify the robustness of routing solutions to traffic fluctuations. Extensive experiments under high station density and multi-vehicle coordination compare TE-MARL with state-of-the-art learning-based and heuristic baselines. The results indicate that, on a representative C100-S20-V8 instance, TE-MARL reduces total travel time and total energy consumption by 20.4% and 17.6% relative to a MARL baseline, and by 14.9% and 10.9% relative to a hybrid Q-DH method, while achieving 100% feasibility and higher traffic adaptability. These gains substantially mitigate the risks of peak-hour time-window violations and excessive energy use.
扫码关注我们
求助内容:
应助结果提醒方式:
