Explainable representative-days clustering on low-voltage grid meters and feeders, with noise-aware multi-objective Bayesian optimization, applied to grid-congestion events
Konstantinos Theodorakos , Oscar Mauricio Agudelo , Thijs Becker , Koen Vanthournout , Reinhilde D’hulst , Bart De Moor
{"title":"Explainable representative-days clustering on low-voltage grid meters and feeders, with noise-aware multi-objective Bayesian optimization, applied to grid-congestion events","authors":"Konstantinos Theodorakos , Oscar Mauricio Agudelo , Thijs Becker , Koen Vanthournout , Reinhilde D’hulst , Bart De Moor","doi":"10.1016/j.segan.2025.101622","DOIUrl":null,"url":null,"abstract":"<div><div>Low-voltage grid (LVG) state estimations help in expansion planning and preventing congestion events. However, country-scale simulations pose high computational burdens. A solution that reduces calculation time is to cluster similar days, and only simulate the most representative day of each cluster. Using real-world, quarter-hour residential consumption time series from 925 meters, congestion event probabilities from 146 real feeders, 51 daily meteorological and 12 calendar features, we propose a novel end-to-end representative-days clustering framework. Along with <span><math><mi>k</mi></math></span>-medoids clustering, we use dimensionality reduction (kernel principal component analysis, factor analysis, …) and pre/post-processing. To emphasize quarter-hour extremes, we apply dynamic data squeezing/expansion, based on the LVG consumption median. Our approach is scalable because dimensionality reduction compresses thousands of daily variables into up to 300 components, regardless of the amount of meters and exogenous features (weather, calendar, …). Multi-objective Bayesian optimization for noisy functions, along with Sobol sampling, finds the hyperparameters that minimize the <span><math><mi>k</mi></math></span> representative-day reconstruction error compared to the full-year simulation. Explainability is achieved via tree ensemble classification on the decided clusters: ranking of the most important meteorological and calendar features, along with rule induction for future clustering decisions. We applied these techniques to successfully cluster all days of 2016 of the Flemish LVG (in Belgium). Our approach works well on both meter consumptions and feeder congestion event approximations. Photosynthetic radiation (visible spectrum and solar panel absorption wavelengths), water temperature, soil water volume/temperature (variables with a recent weather memory effect) and albedo were the most important meteorological factors, along with calendar features.</div></div>","PeriodicalId":56142,"journal":{"name":"Sustainable Energy Grids & Networks","volume":"41 ","pages":"Article 101622"},"PeriodicalIF":4.8000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sustainable Energy Grids & Networks","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352467725000049","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
Low-voltage grid (LVG) state estimations help in expansion planning and preventing congestion events. However, country-scale simulations pose high computational burdens. A solution that reduces calculation time is to cluster similar days, and only simulate the most representative day of each cluster. Using real-world, quarter-hour residential consumption time series from 925 meters, congestion event probabilities from 146 real feeders, 51 daily meteorological and 12 calendar features, we propose a novel end-to-end representative-days clustering framework. Along with -medoids clustering, we use dimensionality reduction (kernel principal component analysis, factor analysis, …) and pre/post-processing. To emphasize quarter-hour extremes, we apply dynamic data squeezing/expansion, based on the LVG consumption median. Our approach is scalable because dimensionality reduction compresses thousands of daily variables into up to 300 components, regardless of the amount of meters and exogenous features (weather, calendar, …). Multi-objective Bayesian optimization for noisy functions, along with Sobol sampling, finds the hyperparameters that minimize the representative-day reconstruction error compared to the full-year simulation. Explainability is achieved via tree ensemble classification on the decided clusters: ranking of the most important meteorological and calendar features, along with rule induction for future clustering decisions. We applied these techniques to successfully cluster all days of 2016 of the Flemish LVG (in Belgium). Our approach works well on both meter consumptions and feeder congestion event approximations. Photosynthetic radiation (visible spectrum and solar panel absorption wavelengths), water temperature, soil water volume/temperature (variables with a recent weather memory effect) and albedo were the most important meteorological factors, along with calendar features.
期刊介绍:
Sustainable Energy, Grids and Networks (SEGAN)is an international peer-reviewed publication for theoretical and applied research dealing with energy, information grids and power networks, including smart grids from super to micro grid scales. SEGAN welcomes papers describing fundamental advances in mathematical, statistical or computational methods with application to power and energy systems, as well as papers on applications, computation and modeling in the areas of electrical and energy systems with coupled information and communication technologies.