Anja Mühlemann, Philip Stange, Antoine Faul, Serena Lozza-Fiacco, Rowan Iskandar, Manuela Moraru, Susanne Theis, Petra Stute, Ben D Spycher, David Ginsbourger
{"title":"Comparing imputation approaches to handle systematically missing inputs in risk calculators.","authors":"Anja Mühlemann, Philip Stange, Antoine Faul, Serena Lozza-Fiacco, Rowan Iskandar, Manuela Moraru, Susanne Theis, Petra Stute, Ben D Spycher, David Ginsbourger","doi":"10.1371/journal.pdig.0000712","DOIUrl":null,"url":null,"abstract":"<p><p>Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 1","pages":"e0000712"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781665/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Risk calculators based on statistical and/or mechanistic models have flourished and are increasingly available for a variety of diseases. However, in the day-to-day practice, their usage may be hampered by missing input variables. Certain measurements needed to calculate disease risk may be difficult to acquire, e.g. because they necessitate blood draws, and may be systematically missing in the population of interest. We compare several deterministic and probabilistic imputation approaches to surrogate predictions from risk calculators while accounting for uncertainty due to systematically missing inputs. The considered approaches predict missing inputs from available ones. In the case of probabilistic imputation, this leads to probabilistic prediction of the risk. We compare the methods using scoring techniques for forecast evaluation, with a focus on the Brier and CRPS scores. We also discuss the classification of patients into risk groups defined by thresholding predicted probabilities. While the considered procedures are not meant to replace fully-informed risk calculations, employing them to get first indications of risk distribution in the absence of at least one input parameter may find useful applications in medical practice. To illustrate this, we use the SCORE2 risk calculator for cardiovascular disease and a data set including medical data from 359 women, obtained from the gynecology department at the Inselspital in Bern, Switzerland. Using this data set, we mimic the situation where some input parameters, blood lipids and blood pressure, are systematically missing and compute the SCORE2 risk by probabilistic imputation of the missing variables based on the remaining input variables. We compare this approach to established imputation techniques like MICE by means of scoring rules and visualize in turn how probabilistic imputation can be used in sample size considerations.