Laila A. Al-Essa, Endris Assen Ebrahim, Yusuf Ali Mergiaw
{"title":"Bayesian regression modeling and inference of energy efficiency data: the effect of collinearity and sensitivity analysis","authors":"Laila A. Al-Essa, Endris Assen Ebrahim, Yusuf Ali Mergiaw","doi":"10.3389/fenrg.2024.1416126","DOIUrl":null,"url":null,"abstract":"The majority of research predicted heating demand using linear regression models, but they did not give current building features enough context. Model problems such as Multicollinearity need to be checked and appropriate features must be chosen based on their significance to produce accurate load predictions and inferences. Numerous building energy efficiency features correlate with each other and with heating load in the energy efficiency dataset. The standard Ordinary Least Square regression has a problem when the dataset shows Multicollinearity. Bayesian supervised machine learning is a popular method for parameter estimation and inference when frequentist statistical assumptions fail. The prediction of the heating load as the energy efficiency output with Bayesian inference in multiple regression with a collinearity problem needs careful data analysis. The parameter estimates and hypothesis tests were significantly impacted by the Multicollinearity problem that occurred among the features in the building energy efficiency dataset. This study demonstrated several shrinkage and informative priors on likelihood in the Bayesian framework as alternative solutions or remedies to reduce the collinearity problem in multiple regression analysis. This manuscript tried to model the standard Ordinary Least Square regression and four distinct Bayesian regression models with several prior distributions using the Hamiltonian Monte Carlo algorithm in Bayesian Regression Modeling using Stan and the package used to fit linear models. Several model comparison and assessment methods were used to select the best-fit regression model for the dataset. The Bayesian regression model with weakly informative prior is the best-fitted model compared to the standard Ordinary Least Squares regression and other Bayesian regression models with shrinkage priors for collinear energy efficiency data. The numerical findings of collinearity were checked using variance inflation factor, estimates of regression coefficient and standard errors, and sensitivity of priors and likelihoods. It is suggested that applied research in science, engineering, agriculture, health, and other disciplines needs to check the Multicollinearity effect for regression modeling for better estimation and inference.","PeriodicalId":12428,"journal":{"name":"Frontiers in Energy Research","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Energy Research","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3389/fenrg.2024.1416126","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
The majority of research predicted heating demand using linear regression models, but they did not give current building features enough context. Model problems such as Multicollinearity need to be checked and appropriate features must be chosen based on their significance to produce accurate load predictions and inferences. Numerous building energy efficiency features correlate with each other and with heating load in the energy efficiency dataset. The standard Ordinary Least Square regression has a problem when the dataset shows Multicollinearity. Bayesian supervised machine learning is a popular method for parameter estimation and inference when frequentist statistical assumptions fail. The prediction of the heating load as the energy efficiency output with Bayesian inference in multiple regression with a collinearity problem needs careful data analysis. The parameter estimates and hypothesis tests were significantly impacted by the Multicollinearity problem that occurred among the features in the building energy efficiency dataset. This study demonstrated several shrinkage and informative priors on likelihood in the Bayesian framework as alternative solutions or remedies to reduce the collinearity problem in multiple regression analysis. This manuscript tried to model the standard Ordinary Least Square regression and four distinct Bayesian regression models with several prior distributions using the Hamiltonian Monte Carlo algorithm in Bayesian Regression Modeling using Stan and the package used to fit linear models. Several model comparison and assessment methods were used to select the best-fit regression model for the dataset. The Bayesian regression model with weakly informative prior is the best-fitted model compared to the standard Ordinary Least Squares regression and other Bayesian regression models with shrinkage priors for collinear energy efficiency data. The numerical findings of collinearity were checked using variance inflation factor, estimates of regression coefficient and standard errors, and sensitivity of priors and likelihoods. It is suggested that applied research in science, engineering, agriculture, health, and other disciplines needs to check the Multicollinearity effect for regression modeling for better estimation and inference.
大多数研究使用线性回归模型预测供暖需求,但这些模型没有充分考虑当前建筑的特点。需要检查多重共线性等模型问题,并根据其重要性选择适当的特征,以得出准确的负荷预测和推论。在能效数据集中,有许多建筑能效特征相互关联,并与供热负荷相关。当数据集出现多重共线性时,标准的普通最小二乘法回归就会出现问题。当频繁主义统计假设失效时,贝叶斯监督机器学习是一种常用的参数估计和推理方法。在存在共线性问题的多元回归中,利用贝叶斯推理预测作为能效产出的供热负荷,需要进行仔细的数据分析。建筑能效数据集的特征之间存在多重共线性问题,这严重影响了参数估计和假设检验。本研究展示了贝叶斯框架中的几种收缩和似然信息先验,作为减少多元回归分析中的共线性问题的替代解决方案或补救措施。本手稿使用 Stan 和线性模型拟合软件包中的贝叶斯回归建模中的哈密尔顿蒙特卡罗算法,尝试了标准普通最小二乘回归模型和四种不同的贝叶斯回归模型,并使用了几种先验分布。使用了几种模型比较和评估方法来选择数据集的最佳拟合回归模型。与标准普通最小二乘法回归模型和其他具有收缩先验的贝叶斯回归模型相比,具有弱信息先验的贝叶斯回归模型是能效数据共线性的最佳拟合模型。利用方差膨胀因子、回归系数和标准误差的估计值以及先验和似然的敏感性检验了共线性的数值结论。建议科学、工程、农业、健康和其他学科的应用研究需要检查回归建模的多重共线性效应,以获得更好的估计和推断。
期刊介绍:
Frontiers in Energy Research makes use of the unique Frontiers platform for open-access publishing and research networking for scientists, which provides an equal opportunity to seek, share and create knowledge. The mission of Frontiers is to place publishing back in the hands of working scientists and to promote an interactive, fair, and efficient review process. Articles are peer-reviewed according to the Frontiers review guidelines, which evaluate manuscripts on objective editorial criteria