Greedy knot selection algorithm for restricted cubic spline regression

J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten
{"title":"Greedy knot selection algorithm for restricted cubic spline regression","authors":"J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten","doi":"10.3389/fepid.2023.1283705","DOIUrl":null,"url":null,"abstract":"Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.","PeriodicalId":73083,"journal":{"name":"Frontiers in epidemiology","volume":"16 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fepid.2023.1283705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
限制性三次样条回归的贪婪节点选择算法
非线性回归模型在流行病学中很常见,用于预测或估计预测变量与反应变量之间的关系。受限三次样条曲线(RCS)回归就是这样一种方法,例如,它与 Cox 比例危险回归模型分析高度相关。RCS 回归使用在结点处连接的三阶多项式来模拟非线性关系。标准的方法是在外侧边界之间按定量的规则序列放置结点。使用相对较多的结点可以很容易地将回归曲线拟合到样本中。这样就会出现过度拟合的问题,即回归模型与给定样本拟合良好,但不能很好地推广到其他样本。因此,我们倾向于使用较少的结点数。然而,标准的结点选择过程可能会导致预测变量的稀疏区域表现不佳,尤其是在使用较少的结点数时。在密度较高的区域,它还可能导致过度拟合。我们介绍了一种简单的贪婪搜索算法,该算法使用了一种用于选择结点的后向方法,在模拟实验中,与标准结点选择过程相比,预测误差和贝叶斯信息标准得分都有所降低。我们已将该算法作为开源 R 软件包 knutar 的一部分加以实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The impact of cognitive bias about infectious diseases on social well-being. The spatio-temporal evolution of leishmaniasis in the province of Essaouira. Prevalence of chronic kidney disease and associated factors among adult diabetic patients: a hospital-based cross-sectional study. Using a computational cognitive model to simulate the effects of personal and social network experiences on seasonal influenza vaccination decisions. Prevalence of occupational injuries and associated factors among solid waste collectors in Jigjiga city, eastern Ethiopia: a cross-sectional study design.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1