Coordinated Reactive Power Optimization for Transmission and Distribution System With Gross Prediction Errors: A Modified Belief Markov Decision Process-Based Reinforcement Learning Methodology

IF 7.2 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Power Systems Pub Date : 2025-01-13 DOI:10.1109/TPWRS.2025.3528428

Yaru Gu;Xueliang Huang

{"title":"Coordinated Reactive Power Optimization for Transmission and Distribution System With Gross Prediction Errors: A Modified Belief Markov Decision Process-Based Reinforcement Learning Methodology","authors":"Yaru Gu;Xueliang Huang","doi":"10.1109/TPWRS.2025.3528428","DOIUrl":null,"url":null,"abstract":"Given the significant uncertainty of distributed generations (DGs) in the transmission and distribution (T&D) system, we propose a novel Modified Belief Markov Decision Process-based (MBMDP-based) Reinforcement Learning scheme (namely, MBMRL) for Day-ahead Coordinated Reactive Power Optimization Problem (DCRPOP) with gross prediction errors. Firstly, we characterize DCRPOP as a Partially Observable Markov Decision Process (POMDP) model embedded with a belief state, which utilizes the probability distribution of the observed state with errors to portray the precise state. Secondly, the POMDP model is transformed into the MBMDP model by introducing the misestimated belief state probability vector. A misestimated belief state probability vector is incorporated into the belief state update process to enhance the confidence level in circumstances of a significant data discrepancy. Then, the MBMDP block with a high confidence level for the precise state is inputted into the underlying network architecture of the multi-agent actor-attention-critic algorithm, assisting agents in independently capturing features and outputting optimal decision-making actions even with significant data errors. Case studies are conducted in two T&D systems with different scales. The training dataset is constructed based on a real historical database from Suzhou, China. Simulation results validate the superior performance and scalability of the proposed methodology.","PeriodicalId":13373,"journal":{"name":"IEEE Transactions on Power Systems","volume":"40 5","pages":"3619-3631"},"PeriodicalIF":7.2000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Power Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10839083/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Given the significant uncertainty of distributed generations (DGs) in the transmission and distribution (T&D) system, we propose a novel Modified Belief Markov Decision Process-based (MBMDP-based) Reinforcement Learning scheme (namely, MBMRL) for Day-ahead Coordinated Reactive Power Optimization Problem (DCRPOP) with gross prediction errors. Firstly, we characterize DCRPOP as a Partially Observable Markov Decision Process (POMDP) model embedded with a belief state, which utilizes the probability distribution of the observed state with errors to portray the precise state. Secondly, the POMDP model is transformed into the MBMDP model by introducing the misestimated belief state probability vector. A misestimated belief state probability vector is incorporated into the belief state update process to enhance the confidence level in circumstances of a significant data discrepancy. Then, the MBMDP block with a high confidence level for the precise state is inputted into the underlying network architecture of the multi-agent actor-attention-critic algorithm, assisting agents in independently capturing features and outputting optimal decision-making actions even with significant data errors. Case studies are conducted in two T&D systems with different scales. The training dataset is constructed based on a real historical database from Suzhou, China. Simulation results validate the superior performance and scalability of the proposed methodology.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

带粗预测误差的输配电系统协调无功优化：基于改进信念马尔可夫决策过程的强化学习方法

针对输配电（T&D）系统中分布式代（dg）的显著不确定性，提出了一种基于改进信念马尔可夫决策过程（mbmdp）的新型强化学习方案（即MBMRL），用于解决具有粗预测误差的日前协调无功优化问题（DCRPOP）。首先，我们将DCRPOP描述为嵌入信念状态的部分可观察马尔可夫决策过程（POMDP）模型，该模型利用带有误差的观测状态的概率分布来描述精确状态。其次，通过引入错误估计的信念状态概率向量，将POMDP模型转化为MBMDP模型；在信念状态更新过程中引入错误估计的信念状态概率向量，以提高在数据差异较大的情况下的置信水平。然后，将对精确状态具有高置信度的MBMDP块输入到多智能体actor-attention-critic算法的底层网络架构中，帮助智能体在数据误差较大的情况下独立捕获特征并输出最优决策动作。在两个不同规模的输配电系统中进行了案例研究。训练数据集是基于中国苏州的真实历史数据库构建的。仿真结果验证了该方法的优越性能和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Power Systems 工程技术-工程：电子与电气

CiteScore

15.80

自引率

7.60%

发文量

696

审稿时长

3 months

期刊介绍： The scope of IEEE Transactions on Power Systems covers the education, analysis, operation, planning, and economics of electric generation, transmission, and distribution systems for general industrial, commercial, public, and domestic consumption, including the interaction with multi-energy carriers. The focus of this transactions is the power system from a systems viewpoint instead of components of the system. It has five (5) key areas within its scope with several technical topics within each area. These areas are: (1) Power Engineering Education, (2) Power System Analysis, Computing, and Economics, (3) Power System Dynamic Performance, (4) Power System Operations, and (5) Power System Planning and Implementation.