Comparing imputation approaches to handle systematically missing inputs in risk calculators.

IF 7.7 PLOS digital health Pub Date : 2025-01-30 eCollection Date: 2025-01-01 DOI:10.1371/journal.pdig.0000712

Anja Mühlemann, Philip Stange, Antoine Faul, Serena Lozza-Fiacco, Rowan Iskandar, Manuela Moraru, Susanne Theis, Petra Stute, Ben D Spycher, David Ginsbourger

{"title":"Comparing imputation approaches to handle systematically missing inputs in risk calculators.","authors":"Anja Mühlemann, Philip Stange, Antoine Faul, Serena Lozza-Fiacco, Rowan Iskandar, Manuela Moraru, Susanne Theis, Petra Stute, Ben D Spycher, David Ginsbourger","doi":"10.1371/journal.pdig.0000712","DOIUrl":null,"url":null,"abstract":"<p><p>Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 1","pages":"e0000712"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781665/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

比较处理风险计算器系统缺失输入的归算方法。

基于统计和/或机制模型的风险计算器已经蓬勃发展，越来越多地用于各种疾病。然而，在日常实践中，它们的使用可能会因缺少输入变量而受到阻碍。计算疾病风险所需的某些测量可能难以获得，例如，因为它们需要抽血，并且可能在相关人群中系统性地缺失。我们比较了几种确定性和概率归算方法来替代风险计算器的预测，同时考虑了由于系统缺失输入而导致的不确定性。所考虑的方法从可用的输入中预测缺失的输入。在概率归算的情况下，这导致风险的概率预测。我们比较了使用评分技术进行预测评估的方法，重点是Brier和CRPS评分。我们还讨论了通过阈值预测概率定义的患者风险组的分类。虽然所考虑的程序并不意味着取代充分知情的风险计算，但在缺乏至少一个输入参数的情况下，采用它们来获得风险分布的初步迹象，可能会在医疗实践中找到有用的应用。为了说明这一点，我们使用了心血管疾病的SCORE2风险计算器和一组包括359名妇女的医疗数据的数据集，这些数据来自瑞士伯尔尼Inselspital的妇科。使用该数据集，我们模拟了一些输入参数，如血脂和血压，系统地缺失的情况，并通过基于剩余输入变量的缺失变量的概率imputation来计算SCORE2风险。我们通过评分规则将这种方法与已建立的归算技术（如MICE）进行比较，并依次可视化概率归算如何用于样本量考虑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PLOS digital health

自引率

0.00%

发文量