Predicting the time to get back to work using statistical models and machine learning approaches.

IF 3.9 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES BMC Medical Research Methodology Pub Date : 2024-11-29 DOI:10.1186/s12874-024-02390-4

George Bouliotis, M Underwood, R Froud

{"title":"Predicting the time to get back to work using statistical models and machine learning approaches.","authors":"George Bouliotis, M Underwood, R Froud","doi":"10.1186/s12874-024-02390-4","DOIUrl":null,"url":null,"abstract":"Background: Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown.Objectives: To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme.Methods: The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival).Results: At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were 'family issues and additional barriers', 'restriction of hours', 'available CV', 'self-employment considered' and 'education'. The Harrell's Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences.Conclusion: Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"295"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606207/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02390-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Whether machine learning approaches are superior to classical statistical models for survival analyses, especially in the case of lack of proportionality, is unknown.

Objectives: To compare model performance and predictive accuracy of classic regressions and machine learning approaches using data from the Inspiring Families programme.

Methods: The Inspiring Families programme aims to support members of families with complex issues to return to work. We explored predictors of time to return to work with proportional hazards (Semi-Parametric Cox in Stata) and (Flexible Parametric Parmar-Royston in Stata) against the Survival penalised regression with Elastic Net penalty (scikit-survival), (conditional) Survival Forest algorithm (pySurvival), and (kernel) Survival Support Vector Machine (pySurvival).

Results: At baseline we obtained data on 61 binary variables from all 3161 participants. No model appeared superior, with a low predictive power (concordance index between 0.51 and 0.61). The median time for finding the first job was about 254 days. The top five contributing variables were 'family issues and additional barriers', 'restriction of hours', 'available CV', 'self-employment considered' and 'education'. The Harrell's Concordance index was range from 0.60 (Cox model) to 0.71 (Random Survival Forest) suggesting a better fit for the machine learning approaches. However, the comparison for predicting median time on a selected scenario based showed only minor differences.

Conclusion: Implementing a series of survival models with and without proportional hazards background provides a useful insight as well as better interpretation of the coefficients affected by non-linearities. However, that better fit does not translate to substantially higher predictive power and accuracy from using machine learning approaches. Further tuning of the machine learning algorithms may provide improved results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用统计模型和机器学习方法预测重返工作岗位的时间。

背景：机器学习方法是否优于经典的生存分析统计模型，特别是在缺乏比例性的情况下，是未知的。目的：比较经典回归和机器学习方法的模型性能和预测准确性，使用来自激励家庭计划的数据。方法：激励家庭方案旨在支持有复杂问题的家庭成员重返工作岗位。我们探索了使用比例风险（Stata中的半参数Cox）和（Stata中的柔性参数Parmar-Royston）来预测回归工作时间的预测因子，以对抗弹性网络惩罚（scikit-survival）、（条件）生存森林算法（pySurvival）和（内核）生存支持向量机（pySurvival）的生存惩罚回归。结果：在基线时，我们获得了来自所有3161名参与者的61个二进制变量的数据。没有模型表现出优势，预测能力较低（一致性指数在0.51 ~ 0.61之间）。找到第一份工作的平均时间约为254天。前五大影响因素是“家庭问题和其他障碍”、“工作时间限制”、“可用简历”、“考虑自主创业”和“教育程度”。Harrell’s Concordance指数的范围从0.60 （Cox模型）到0.71（随机生存森林），这表明更适合机器学习方法。然而，预测中位时间的比较在一个选定的方案基础上显示只有微小的差异。结论：实施一系列具有和不具有比例风险背景的生存模型提供了有用的见解，并更好地解释了受非线性影响的系数。然而，更好的拟合并不能转化为使用机器学习方法的更高的预测能力和准确性。进一步调整机器学习算法可能会提供更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC Medical Research Methodology 医学-卫生保健

CiteScore

6.50

自引率

2.50%

发文量

298

审稿时长

3-8 weeks

期刊介绍： BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.