Jing Zhang , Feifei Peng , Lulu Wang , Yang Yang , Yingna Li
{"title":"A load frequency control strategy based on double deep Q-network and upper confidence bound algorithm of multi-area interconnected power systems","authors":"Jing Zhang , Feifei Peng , Lulu Wang , Yang Yang , Yingna Li","doi":"10.1016/j.compeleceng.2024.109778","DOIUrl":null,"url":null,"abstract":"<div><div>The reinforcement learning (RL)-based generation control strategies have been widely studied to address the limited adaptability of traditional automatic generation control (AGC) strategies to the load disturbance problem resulting from heterogeneous energy sources. To improve the control accuracy of the RL-based strategy in load frequency control (LFC), a double deep Q-network combined with an upper confidence bound (DDQN-UCB)-based strategy is designed to solve the problem of agent decision-making in a nonlinear environment. Firstly, the area control error (ACE) and control performance standard 1 (CPS1) of the LFC power system are considered in the design of the RL reward function. Secondly, the actual and estimated Q-values are calculated using the Q-network and the target Q-network combined with the reward value. Thirdly, the deviation loss of the two Q-values is calculated, and the network is updated based on the loss value using gradient descent. Finally, the UCB algorithm is introduced to equalize the frequency of being selected for each action during the random exploration of the actions, and the agent uses the greedy algorithm in combination with the UCB algorithm to select a power-compensated control action to send to the environment. In this paper, the IEEE multi-area LFC power system is used as an experimental validation model. A comparison of the proposed RL control algorithm with five other algorithms revealed that the pre-learning convergence accuracy was improved by 57.5%. Furthermore, the LFC effectiveness test demonstrated that the DDQN-UCB control strategy enhances LFC accuracy while simultaneously stabilizing the power exchange of the inter-area tie-line to within 1.8972 MW, thereby maintaining the stability of the power system.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"120 ","pages":"Article 109778"},"PeriodicalIF":4.0000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007055","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The reinforcement learning (RL)-based generation control strategies have been widely studied to address the limited adaptability of traditional automatic generation control (AGC) strategies to the load disturbance problem resulting from heterogeneous energy sources. To improve the control accuracy of the RL-based strategy in load frequency control (LFC), a double deep Q-network combined with an upper confidence bound (DDQN-UCB)-based strategy is designed to solve the problem of agent decision-making in a nonlinear environment. Firstly, the area control error (ACE) and control performance standard 1 (CPS1) of the LFC power system are considered in the design of the RL reward function. Secondly, the actual and estimated Q-values are calculated using the Q-network and the target Q-network combined with the reward value. Thirdly, the deviation loss of the two Q-values is calculated, and the network is updated based on the loss value using gradient descent. Finally, the UCB algorithm is introduced to equalize the frequency of being selected for each action during the random exploration of the actions, and the agent uses the greedy algorithm in combination with the UCB algorithm to select a power-compensated control action to send to the environment. In this paper, the IEEE multi-area LFC power system is used as an experimental validation model. A comparison of the proposed RL control algorithm with five other algorithms revealed that the pre-learning convergence accuracy was improved by 57.5%. Furthermore, the LFC effectiveness test demonstrated that the DDQN-UCB control strategy enhances LFC accuracy while simultaneously stabilizing the power exchange of the inter-area tie-line to within 1.8972 MW, thereby maintaining the stability of the power system.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.