Developing robust decision-making and control systems for autonomous driving in complex, dynamic environments involving multi-vehicle interactions at intersections, roundabouts, and merging ramps remains a significant hurdle. In this context, Reinforcement Learning (RL) emerges as a highly promising approach. The primary methods for applying RL, however, present a core dilemma. On one hand, offline RL cannot adapt well to real-world conditions because it learns from a fixed dataset. On the other hand, online RL requires learning through real-world interaction, which is inherently unsafe for driving. To address these issues, this paper proposes a Transformer-based Offline-to-online Reinforcement Learning (TORL) framework. Firstly, the framework's offline learning paradigm integrates a Transformer architecture with a maximum entropy mechanism. This synergistic approach allows the model to capture long-term temporal dependencies for high-performance decision-making and control while ensuring the initial policy is robust and generalizable. Building on this foundation, the framework employs a trifecta of synergistic mechanisms during online fine-tuning, including Human-in-the-Loop (HITL) safe exploration, a hybrid replay buffer, and a mixed data-source learning approach, to simultaneously mitigate performance degradation from distributional shifts and neutralize the critical safety risks of online exploration. Comprehensive experiments conducted in the MetaDrive simulation environment demonstrate that TORL surpasses baseline methods, achieving an absolute increase of approximately 29.4% in normalized return and 46.1% in task success rate, while maintaining a zero-collision record. Furthermore, the framework's real-time feasibility was validated on an experimental autonomous vehicle platform, demonstrating low computational latency suitable for practical deployment. This study demonstrates that the proposed offline-to-online RL paradigm offers a robust and effective solution for developing high-performance decision-making and control systems for autonomous vehicles.
扫码关注我们
求助内容:
应助结果提醒方式:
