Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods.

Artuur M Leeuwenberg, Maarten van Smeden, Johannes A Langendijk, Arjen van der Schaaf, Murielle E Mauer, Karel G M Moons, Johannes B Reitsma, Ewoud Schuit
{"title":"Performance of binary prediction models in high-correlation low-dimensional settings: a comparison of methods.","authors":"Artuur M Leeuwenberg,&nbsp;Maarten van Smeden,&nbsp;Johannes A Langendijk,&nbsp;Arjen van der Schaaf,&nbsp;Murielle E Mauer,&nbsp;Karel G M Moons,&nbsp;Johannes B Reitsma,&nbsp;Ewoud Schuit","doi":"10.1186/s41512-021-00115-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate.</p><p><strong>Methods: </strong>We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations.</p><p><strong>Results: </strong>In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R<sup>2</sup>, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout.</p><p><strong>Conclusions: </strong>Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":" ","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8751246/pdf/","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-021-00115-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Background: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate.

Methods: We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations.

Results: In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R2, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout.

Conclusions: Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
二值预测模型在高相关低维环境下的性能:方法比较。
背景:临床预测模型在医学领域得到了广泛的发展。当这些模型中的预测因子高度共线性时,可能会出现意外或虚假的预测-结果关联,从而潜在地降低预测模型的表面效度。共线性可以通过排除共线性预测因子来处理,但是当没有先验动机(除了共线性)来包括或排除特定的预测因子时,这种方法是任意的,可能是不合适的。方法:我们比较了解决共线性的不同方法,包括收缩、降维和约束优化。通过仿真验证了这些方法的有效性。结果:在进行的模拟中,未观察到共线性对不同方法的预测结果(AUC、R2、截距、斜率)的影响。然而,共线性对预测器选择稳定性的负面影响被发现,影响所有比较的方法,但特别是那些执行强预测器选择的方法(例如Lasso)。在共线性增加的情况下,预测因子最稳定的方法是Ridge、PCLR、LAELR和Dropout。结论:基于结果,我们建议在存在高共线性的情况下避免使用数据驱动的预测器选择方法,因为预测器选择的不稳定性增加,即使在相对较高的每变量事件设置中也是如此。对某些预测因子的选择可能不成比例地给人一种印象,即包括预测因子比排除预测因子与结果有更强的关联。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
18 weeks
期刊最新文献
Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. Rehabilitation outcomes after comprehensive post-acute inpatient rehabilitation following moderate to severe acquired brain injury-study protocol for an overall prognosis study based on routinely collected health data. Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis. Reported prevalence and comparison of diagnostic approaches for Candida africana: a systematic review with meta-analysis. The relative data hungriness of unpenalized and penalized logistic regression and ensemble-based machine learning methods: the case of calibration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1