J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten
{"title":"限制性三次样条回归的贪婪节点选择算法","authors":"J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten","doi":"10.3389/fepid.2023.1283705","DOIUrl":null,"url":null,"abstract":"Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.","PeriodicalId":73083,"journal":{"name":"Frontiers in epidemiology","volume":"16 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Greedy knot selection algorithm for restricted cubic spline regression\",\"authors\":\"J. Arnes, Alexander Hapfelmeier, Alexander Horsch, Tonje Braaten\",\"doi\":\"10.3389/fepid.2023.1283705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.\",\"PeriodicalId\":73083,\"journal\":{\"name\":\"Frontiers in epidemiology\",\"volume\":\"16 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fepid.2023.1283705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fepid.2023.1283705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
非线性回归模型在流行病学中很常见,用于预测或估计预测变量与反应变量之间的关系。受限三次样条曲线(RCS)回归就是这样一种方法,例如,它与 Cox 比例危险回归模型分析高度相关。RCS 回归使用在结点处连接的三阶多项式来模拟非线性关系。标准的方法是在外侧边界之间按定量的规则序列放置结点。使用相对较多的结点可以很容易地将回归曲线拟合到样本中。这样就会出现过度拟合的问题,即回归模型与给定样本拟合良好,但不能很好地推广到其他样本。因此,我们倾向于使用较少的结点数。然而,标准的结点选择过程可能会导致预测变量的稀疏区域表现不佳,尤其是在使用较少的结点数时。在密度较高的区域,它还可能导致过度拟合。我们介绍了一种简单的贪婪搜索算法,该算法使用了一种用于选择结点的后向方法,在模拟实验中,与标准结点选择过程相比,预测误差和贝叶斯信息标准得分都有所降低。我们已将该算法作为开源 R 软件包 knutar 的一部分加以实现。
Greedy knot selection algorithm for restricted cubic spline regression
Non-linear regression modeling is common in epidemiology for prediction purposes or estimating relationships between predictor and response variables. Restricted cubic spline (RCS) regression is one such method, for example, highly relevant to Cox proportional hazard regression model analysis. RCS regression uses third-order polynomials joined at knot points to model non-linear relationships. The standard approach is to place knots by a regular sequence of quantiles between the outer boundaries. A regression curve can easily be fitted to the sample using a relatively high number of knots. The problem is then overfitting, where a regression model has a good fit to the given sample but does not generalize well to other samples. A low knot count is thus preferred. However, the standard knot selection process can lead to underperformance in the sparser regions of the predictor variable, especially when using a low number of knots. It can also lead to overfitting in the denser regions. We present a simple greedy search algorithm using a backward method for knot selection that shows reduced prediction error and Bayesian information criterion scores compared to the standard knot selection process in simulation experiments. We have implemented the algorithm as part of an open-source R-package, knutar.