HO2RL: A Novel Hybrid Offline-and-Online Reinforcement Learning Method for Active Pantograph Control

IF 7.2 1区工程技术 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Industrial Electronics Pub Date : 2024-11-06 DOI:10.1109/TIE.2024.3477002

Hui Wang;Zhigang Liu;Zhiwei Han

{"title":"HO2RL: A Novel Hybrid Offline-and-Online Reinforcement Learning Method for Active Pantograph Control","authors":"Hui Wang;Zhigang Liu;Zhiwei Han","doi":"10.1109/TIE.2024.3477002","DOIUrl":null,"url":null,"abstract":"The pantograph–catenary system (PCS) is vital for high-speed trains to collect electrical power, where the contact force fluctuation seriously reduces the current collection quality, increases maintenance costs, and affects operation safety. Reinforcement learning (RL) is an attractive approach for learning active pantograph control policy by trial and error. However, the traditional RL methods suffer significant performance degradation or collapse when deployed to the real world due to the huge sim-real gap. We propose a hybrid offline-and-online reinforcement learning (HO2RL) algorithm to solve active pantograph control tasks, which elegantly combines RL policy pretraining with offline transitions and performance enhancement with online data collection. The proposed algorithm provides generalized pretrained models by learning effective behavior policy from offline experiences and then performs multidomain adaptation by online performance improvement with dynamics-aware policy evaluation. Experimental results demonstrate that the HO2RL algorithm efficiently learns from large and diverse static datasets and enables steady performance improvement by fine-tuning with online interactions. The proposed method solves active pantograph control tasks in various operation scenarios and demonstrates SOTA performance on the PCS standard benchmark.","PeriodicalId":13402,"journal":{"name":"IEEE Transactions on Industrial Electronics","volume":"72 6","pages":"6286-6296"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10746248/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The pantograph–catenary system (PCS) is vital for high-speed trains to collect electrical power, where the contact force fluctuation seriously reduces the current collection quality, increases maintenance costs, and affects operation safety. Reinforcement learning (RL) is an attractive approach for learning active pantograph control policy by trial and error. However, the traditional RL methods suffer significant performance degradation or collapse when deployed to the real world due to the huge sim-real gap. We propose a hybrid offline-and-online reinforcement learning (HO2RL) algorithm to solve active pantograph control tasks, which elegantly combines RL policy pretraining with offline transitions and performance enhancement with online data collection. The proposed algorithm provides generalized pretrained models by learning effective behavior policy from offline experiences and then performs multidomain adaptation by online performance improvement with dynamics-aware policy evaluation. Experimental results demonstrate that the HO2RL algorithm efficiently learns from large and diverse static datasets and enables steady performance improvement by fine-tuning with online interactions. The proposed method solves active pantograph control tasks in various operation scenarios and demonstrates SOTA performance on the PCS standard benchmark.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HO2RL：用于主动受电弓控制的新型离线和在线混合强化学习方法

受电弓接触网系统（PCS）是高速列车采集电能的关键系统，其接触力波动严重降低了采集电流的质量，增加了维护成本，影响了运行安全。强化学习（RL）是通过试错法学习主动受电弓控制策略的一种有吸引力的方法。然而，传统的RL方法在应用到现实世界时，由于巨大的模拟-真实差距而导致性能显著下降或崩溃。我们提出了一种离线和在线混合强化学习（HO2RL）算法来解决主动受电弓控制任务，该算法将RL策略预训练与离线转换以及性能增强与在线数据收集巧妙地结合在一起。该算法通过从离线经验中学习有效的行为策略提供广义预训练模型，然后通过动态感知策略评估的在线性能改进进行多领域自适应。实验结果表明，HO2RL算法能够有效地从大型和多样化的静态数据集中学习，并通过在线交互的微调实现稳定的性能提升。该方法解决了各种操作场景下的主动受电弓控制任务，并在PCS标准基准测试中验证了SOTA的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Industrial Electronics 工程技术-工程：电子与电气

CiteScore

16.80

自引率

9.10%

发文量

1396

审稿时长

6.3 months

期刊介绍： Journal Name: IEEE Transactions on Industrial Electronics Publication Frequency: Monthly Scope: The scope of IEEE Transactions on Industrial Electronics encompasses the following areas: Applications of electronics, controls, and communications in industrial and manufacturing systems and processes. Power electronics and drive control techniques. System control and signal processing. Fault detection and diagnosis. Power systems. Instrumentation, measurement, and testing. Modeling and simulation. Motion control. Robotics. Sensors and actuators. Implementation of neural networks, fuzzy logic, and artificial intelligence in industrial systems. Factory automation. Communication and computer networks.