Effects of multicollinearity and data granularity on regression models of stream temperature

IF 5.9 1区 地球科学 Q1 ENGINEERING, CIVIL Journal of Hydrology Pub Date : 2024-06-27 DOI:10.1016/j.jhydrol.2024.131572
Halil I. Dertli , Daniel B. Hayes , Troy G. Zorn
{"title":"Effects of multicollinearity and data granularity on regression models of stream temperature","authors":"Halil I. Dertli ,&nbsp;Daniel B. Hayes ,&nbsp;Troy G. Zorn","doi":"10.1016/j.jhydrol.2024.131572","DOIUrl":null,"url":null,"abstract":"<div><p>Water temperature is a key factor influencing biota of stream ecosystems. Hence, it is important to comprehend the environmental drivers of stream temperature for robust prediction of conditions and effective management of stream communities. Linear regression models are commonly used for predictive purposes, but their predictive capacity and interpretability can be significantly affected by their complexity and the structure of input data. In some cases, researchers may be obligated to favor prediction power or interpretability while compromising the other. Therefore, insight into relationships between model fit, correlation among predictor variables (i.e., multicollinearity), and level of temporal aggregation of data (i.e., data granularity) may be helpful to reduce such trade-offs. In this paper, we investigated these relationships within a hierarchical set of multiple linear regression (MLR) models examining environmental factors influencing stream temperature dynamics. Our findings showed that as the number of predictor variables (i.e., model complexity) increased, the magnitude of multicollinearity in MLR models increased, but model fit also increased. The results also revealed that using data averaged over longer time frames (i.e., coarser data granularity) yielded high multicollinearity, as indexed by variance inflation factor values (VIF) for all model predictors. This led to higher variance in parameter estimates (i.e., parameter instability) and potential challenges in model interpretation as the sign of parameter estimates changed in many streams examined. Multicollinearity was not the only reason for these changes in the sign of parameter estimates as they were also observed in simple linear regression models across varying levels of data granularity. Based on our findings, we conclude that the selection of data granularity is an important consideration in multiple regression modeling, with profound implications for model interpretability.</p></div>","PeriodicalId":362,"journal":{"name":"Journal of Hydrology","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hydrology","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022169424009685","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0

Abstract

Water temperature is a key factor influencing biota of stream ecosystems. Hence, it is important to comprehend the environmental drivers of stream temperature for robust prediction of conditions and effective management of stream communities. Linear regression models are commonly used for predictive purposes, but their predictive capacity and interpretability can be significantly affected by their complexity and the structure of input data. In some cases, researchers may be obligated to favor prediction power or interpretability while compromising the other. Therefore, insight into relationships between model fit, correlation among predictor variables (i.e., multicollinearity), and level of temporal aggregation of data (i.e., data granularity) may be helpful to reduce such trade-offs. In this paper, we investigated these relationships within a hierarchical set of multiple linear regression (MLR) models examining environmental factors influencing stream temperature dynamics. Our findings showed that as the number of predictor variables (i.e., model complexity) increased, the magnitude of multicollinearity in MLR models increased, but model fit also increased. The results also revealed that using data averaged over longer time frames (i.e., coarser data granularity) yielded high multicollinearity, as indexed by variance inflation factor values (VIF) for all model predictors. This led to higher variance in parameter estimates (i.e., parameter instability) and potential challenges in model interpretation as the sign of parameter estimates changed in many streams examined. Multicollinearity was not the only reason for these changes in the sign of parameter estimates as they were also observed in simple linear regression models across varying levels of data granularity. Based on our findings, we conclude that the selection of data granularity is an important consideration in multiple regression modeling, with profound implications for model interpretability.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多重共线性和数据粒度对溪流温度回归模型的影响
水温是影响溪流生态系统生物群落的关键因素。因此,了解溪流温度的环境驱动因素对于预测溪流群落的状况和有效管理非常重要。线性回归模型通常用于预测目的,但其预测能力和可解释性会受到其复杂性和输入数据结构的严重影响。在某些情况下,研究人员可能不得不偏重预测能力或可解释性,而忽略了其他方面。因此,深入了解模型拟合度、预测变量之间的相关性(即多重共线性)和数据的时间聚合度(即数据粒度)之间的关系,可能有助于减少这种权衡。在本文中,我们在一组分层多元线性回归(MLR)模型中研究了这些关系,这些模型考察了影响溪流温度动态的环境因素。我们的研究结果表明,随着预测变量数量(即模型复杂性)的增加,MLR 模型中多重共线性的程度也在增加,但模型拟合度也在增加。结果还显示,使用较长时间框架内的平均数据(即较粗的数据粒度)会产生较高的多重共线性,所有模型预测变量的方差膨胀因子值(VIF)都是如此。这导致参数估算值的方差增大(即参数不稳定),并给模型解释带来潜在挑战,因为在所研究的许多数据流中,参数估算值的符号会发生变化。多重共线性并不是参数估计值符号变化的唯一原因,因为在不同数据粒度的简单线性回归模型中也观察到了这种变化。根据我们的研究结果,我们得出结论:数据粒度的选择是多元回归建模的一个重要考虑因素,对模型的可解释性有着深远的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Hydrology
Journal of Hydrology 地学-地球科学综合
CiteScore
11.00
自引率
12.50%
发文量
1309
审稿时长
7.5 months
期刊介绍: The Journal of Hydrology publishes original research papers and comprehensive reviews in all the subfields of the hydrological sciences including water based management and policy issues that impact on economics and society. These comprise, but are not limited to the physical, chemical, biogeochemical, stochastic and systems aspects of surface and groundwater hydrology, hydrometeorology and hydrogeology. Relevant topics incorporating the insights and methodologies of disciplines such as climatology, water resource systems, hydraulics, agrohydrology, geomorphology, soil science, instrumentation and remote sensing, civil and environmental engineering are included. Social science perspectives on hydrological problems such as resource and ecological economics, environmental sociology, psychology and behavioural science, management and policy analysis are also invited. Multi-and interdisciplinary analyses of hydrological problems are within scope. The science published in the Journal of Hydrology is relevant to catchment scales rather than exclusively to a local scale or site.
期刊最新文献
Dam-break flood hazard and risk assessment of large dam for emergency preparedness: A study of Ukai Dam, India Analytical model of contaminant advection, diffusion and degradation in capped sediments and sensitivity to flow and sediment properties High-resolution monitoring of soil infiltration using distributed fiber optic A hydro-geomorphologic assessment of flood generation potentiality in ungauged sub-basins and their prioritization based on traditional, statistical, MCDM and Nash-GIUH models of a tropical plateau-fringe River The causes of algal blooms exist significant scale effect in tributary of the Three Gorges Reservoir
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1