The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study

IF 5.2 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES Journal of Clinical Epidemiology Pub Date : 2024-12-01 Epub Date: 2024-09-24 DOI:10.1016/j.jclinepi.2024.111539

Manja Deforth , Georg Heinze , Ulrike Held

{"title":"The performance of prognostic models depended on the choice of missing value imputation algorithm: a simulation study","authors":"Manja Deforth , Georg Heinze , Ulrike Held","doi":"10.1016/j.jclinepi.2024.111539","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>The development of clinical prediction models is often impeded by the occurrence of missing values in the predictors. Various methods for imputing missing values before modeling have been proposed. Some of them are based on variants of multiple imputations by chained equations, while others are based on single imputation. These methods may include elements of flexible modeling or machine learning algorithms, and for some of them user-friendly software packages are available. The aim of this study was to investigate by simulation if some of these methods consistently outperform others in performance measures of clinical prediction models.</div></div><div><h3>Study Design and Setting</h3><div>We simulated development and validation cohorts by mimicking observed distributions of predictors and outcome variable of a real data set. In the development cohorts, missing predictor values were created in 36 scenarios defined by the missingness mechanism and proportion of noncomplete cases. We applied three imputation algorithms that were available in R software (R Foundation for Statistical Computing, Vienna, Austria): mice, aregImpute, and missForest. These algorithms differed in their use of linear or flexible models, or random forests, the way of sampling from the predictive posterior distribution, and the generation of a single or multiple imputed data set. For multiple imputation, we also investigated the impact of the number of imputations. Logistic regression models were fitted with the simulated development cohorts before (full data analysis) and after missing value generation (complete case analysis), and with the imputed data. Prognostic model performance was measured by the scaled Brier score, <em>c</em>-statistic, calibration intercept and slope, and by the mean absolute prediction error evaluated in validation cohorts without missing values. Performance of full data analysis was considered as ideal.</div></div><div><h3>Results</h3><div>None of the imputation methods achieved the model's predictive accuracy that would be obtained in case of no missingness. In general, complete case analysis yielded the worst performance, and deviation from ideal performance increased with increasing percentage of missingness and decreasing sample size. Across all scenarios and performance measures, aregImpute and mice, both with 100 imputations, resulted in highest predictive accuracy. Surprisingly, aregImpute outperformed full data analysis in achieving calibration slopes very close to one across all scenarios and outcome models. The increase of mice's performance with 100 compared to five imputations was only marginal. The differences between the imputation methods decreased with increasing sample sizes and decreasing proportion of noncomplete cases.</div></div><div><h3>Conclusion</h3><div>In our simulation study, model calibration was more affected by the choice of the imputation method than model discrimination. While differences in model performance after using imputation methods were generally small, multiple imputation methods as mice and aregImpute that can handle linear or nonlinear associations between predictors and outcome are an attractive and reliable choice in most situations.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"176 ","pages":"Article 111539"},"PeriodicalIF":5.2000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435624002956","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives

The development of clinical prediction models is often impeded by the occurrence of missing values in the predictors. Various methods for imputing missing values before modeling have been proposed. Some of them are based on variants of multiple imputations by chained equations, while others are based on single imputation. These methods may include elements of flexible modeling or machine learning algorithms, and for some of them user-friendly software packages are available. The aim of this study was to investigate by simulation if some of these methods consistently outperform others in performance measures of clinical prediction models.

Study Design and Setting

We simulated development and validation cohorts by mimicking observed distributions of predictors and outcome variable of a real data set. In the development cohorts, missing predictor values were created in 36 scenarios defined by the missingness mechanism and proportion of noncomplete cases. We applied three imputation algorithms that were available in R software (R Foundation for Statistical Computing, Vienna, Austria): mice, aregImpute, and missForest. These algorithms differed in their use of linear or flexible models, or random forests, the way of sampling from the predictive posterior distribution, and the generation of a single or multiple imputed data set. For multiple imputation, we also investigated the impact of the number of imputations. Logistic regression models were fitted with the simulated development cohorts before (full data analysis) and after missing value generation (complete case analysis), and with the imputed data. Prognostic model performance was measured by the scaled Brier score, c-statistic, calibration intercept and slope, and by the mean absolute prediction error evaluated in validation cohorts without missing values. Performance of full data analysis was considered as ideal.

Results

None of the imputation methods achieved the model's predictive accuracy that would be obtained in case of no missingness. In general, complete case analysis yielded the worst performance, and deviation from ideal performance increased with increasing percentage of missingness and decreasing sample size. Across all scenarios and performance measures, aregImpute and mice, both with 100 imputations, resulted in highest predictive accuracy. Surprisingly, aregImpute outperformed full data analysis in achieving calibration slopes very close to one across all scenarios and outcome models. The increase of mice's performance with 100 compared to five imputations was only marginal. The differences between the imputation methods decreased with increasing sample sizes and decreasing proportion of noncomplete cases.

Conclusion

In our simulation study, model calibration was more affected by the choice of the imputation method than model discrimination. While differences in model performance after using imputation methods were generally small, multiple imputation methods as mice and aregImpute that can handle linear or nonlinear associations between predictors and outcome are an attractive and reliable choice in most situations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

预后模型的性能取决于缺失值估算算法的选择：一项模拟研究。

目的临床预测模型的开发通常会受到预测因子缺失值的影响。目前已提出了多种在建模前对缺失值进行归因的方法。其中一些方法基于链式方程的多重归因变体，而另一些方法则基于单项归因。这些方法可能包含灵活建模或机器学习算法的元素，其中一些方法还提供了用户友好型软件包。本研究的目的是通过模拟研究这些方法中的某些方法在临床预测模型的性能指标方面是否始终优于其他方法：研究设计和环境：我们模拟了真实数据集的预测因子和结果变量的观察分布，从而模拟了开发队列和验证队列。在开发队列中，根据缺失机制和非完整病例比例定义的 36 种情况创建了缺失预测值。我们使用了 R 软件中的三种估算算法：mice、aregImpute 和 missForest。这些算法在使用线性或灵活模型或随机森林、从预测后验分布中采样的方式以及生成单个或多个归因数据集方面存在差异。对于多重归因，我们还研究了归因次数的影响。我们利用缺失值生成前（完整数据分析）和缺失值生成后（完整病例分析）的模拟发展队列以及估算数据对逻辑回归模型进行了拟合。预后模型的性能通过标度 Brier 评分、c 统计量、校准截距和斜率以及在无缺失值的验证队列中评估的平均绝对预测误差来衡量。全部数据分析结果被视为理想结果：结果：没有一种估算方法能达到无缺失情况下的模型预测准确性。一般来说，完整病例分析的性能最差，而且随着缺失百分比的增加和样本量的减少，与理想性能的偏差也在增加。在所有情况和性能指标中，aregImpute 和小鼠的预测准确率最高，两者都有 100 次归因。令人惊讶的是，aregImpute 在所有情景和结果模型中的校准斜率都非常接近 1，超过了全数据分析。与 5 次归因相比，小鼠在 100 次归因时的性能仅略有提高。随着样本量的增加和非完整病例比例的降低，估算方法之间的差异也在缩小：结论：在我们的模拟研究中，模型校准受估算方法选择的影响要大于模型判别。虽然使用归因方法后模型性能的差异一般较小，但小鼠和 aregImpute 等可处理预测因子与结果之间线性或非线性关联的多重归因方法在大多数情况下都是有吸引力的可靠选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

12.00

自引率

6.90%

发文量

320

审稿时长

44 days

期刊介绍： The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.