Bang Wang, Yu Cheng, Mitchell H Gail, Jason Fine, Ruth M Pfeiffer
{"title":"Predicting absolute risk for a person with missing risk factors.","authors":"Bang Wang, Yu Cheng, Mitchell H Gail, Jason Fine, Ruth M Pfeiffer","doi":"10.1177/09622802241227945","DOIUrl":null,"url":null,"abstract":"<p><p>We compared methods to project absolute risk, the probability of experiencing the outcome of interest in a given projection interval accommodating competing risks, for a person from the target population with missing predictors. Without missing data, a perfectly calibrated model gives unbiased absolute risk estimates in a new target population, even if the predictor distribution differs from the training data. However, if predictors are missing in target population members, a reference dataset with complete data is needed to impute them and to estimate absolute risk, conditional only on the observed predictors. If the predictor distributions of the reference data and the target population differ, this approach yields biased estimates. We compared the bias and mean squared error of absolute risk predictions for seven methods that assume predictors are missing at random (MAR). Some methods imputed individual missing predictors, others imputed linear predictor combinations (risk scores). Simulations were based on real breast cancer predictor distributions and outcome data. We also analyzed a real breast cancer dataset. The largest bias for all methods resulted from different predictor distributions of the reference and target populations. No method was unbiased in this situation. Surprisingly, violating the MAR assumption did not induce severe biases. Most multiple imputation methods performed similarly and were less biased (but more variable) than a method that used a single expected risk score. Our work shows the importance of selecting predictor reference datasets similar to the target population to reduce bias of absolute risk predictions with missing risk factors.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"557-573"},"PeriodicalIF":1.6000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Methods in Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/09622802241227945","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/1 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
We compared methods to project absolute risk, the probability of experiencing the outcome of interest in a given projection interval accommodating competing risks, for a person from the target population with missing predictors. Without missing data, a perfectly calibrated model gives unbiased absolute risk estimates in a new target population, even if the predictor distribution differs from the training data. However, if predictors are missing in target population members, a reference dataset with complete data is needed to impute them and to estimate absolute risk, conditional only on the observed predictors. If the predictor distributions of the reference data and the target population differ, this approach yields biased estimates. We compared the bias and mean squared error of absolute risk predictions for seven methods that assume predictors are missing at random (MAR). Some methods imputed individual missing predictors, others imputed linear predictor combinations (risk scores). Simulations were based on real breast cancer predictor distributions and outcome data. We also analyzed a real breast cancer dataset. The largest bias for all methods resulted from different predictor distributions of the reference and target populations. No method was unbiased in this situation. Surprisingly, violating the MAR assumption did not induce severe biases. Most multiple imputation methods performed similarly and were less biased (but more variable) than a method that used a single expected risk score. Our work shows the importance of selecting predictor reference datasets similar to the target population to reduce bias of absolute risk predictions with missing risk factors.
我们对预测绝对风险的方法进行了比较,绝对风险是指目标人群中缺失预测因子的人在一定预测区间内经历相关结果的概率,其中考虑到了竞争风险。在没有缺失数据的情况下,即使预测因子的分布与训练数据不同,经过完美校准的模型也能在新的目标人群中给出无偏的绝对风险估计值。但是,如果目标人群中的预测因子缺失,则需要一个具有完整数据的参考数据集来估算这些预测因子,并仅以观测到的预测因子为条件估算绝对风险。如果参考数据和目标人群的预测因子分布不同,这种方法就会产生有偏差的估计值。我们比较了假定预测因子随机缺失(MAR)的七种方法的绝对风险预测偏差和均方误差。一些方法对单个缺失的预测因子进行了估算,另一些方法对线性预测因子组合(风险评分)进行了估算。模拟基于真实的乳腺癌预测因子分布和结果数据。我们还分析了真实的乳腺癌数据集。所有方法的最大偏差都是由于参考人群和目标人群的预测因子分布不同造成的。在这种情况下,没有一种方法是无偏的。令人惊讶的是,违反 MAR 假设并不会导致严重偏差。与使用单一预期风险评分的方法相比,大多数多重估算方法表现相似,偏差较小(但变化较大)。我们的工作表明,选择与目标人群相似的预测参考数据集对于减少缺失风险因素的绝对风险预测偏差非常重要。
期刊介绍:
Statistical Methods in Medical Research is a peer reviewed scholarly journal and is the leading vehicle for articles in all the main areas of medical statistics and an essential reference for all medical statisticians. This unique journal is devoted solely to statistics and medicine and aims to keep professionals abreast of the many powerful statistical techniques now available to the medical profession. This journal is a member of the Committee on Publication Ethics (COPE)