Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

Frontiers in epidemiology Pub Date : 2023-06-18 DOI:10.1101/2023.06.16.23291497

E. Curnow, K. Tilling, J. Heron, R. Cornish, J. Carpenter

{"title":"Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias","authors":"E. Curnow, K. Tilling, J. Heron, R. Cornish, J. Carpenter","doi":"10.1101/2023.06.16.23291497","DOIUrl":null,"url":null,"abstract":"Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (\"auxiliary variables\"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly-chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e. it is a \"collider\"), its inclusion can induce bias in the MI estimator and may increase SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome are incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.","PeriodicalId":73083,"journal":{"name":"Frontiers in epidemiology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.06.16.23291497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables ("auxiliary variables"). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly-chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e. it is a "collider"), its inclusion can induce bias in the MI estimator and may increase SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome are incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

随机缺失下缺失数据的多重插补：在插补模型中包含对撞机作为辅助变量会导致偏差

流行病学研究经常有缺失的数据，通常通过多重插补（MI）处理。在MI中，除了实质性分析所需的变量外，插补模型通常还包括其他变量（“辅助变量”）。预测部分观察到的变量的辅助变量可以减少MI估计器的标准误差（SE），并且如果它们还预测数据丢失的概率，则可以减少由于数据不是随机丢失而引起的偏差。然而，缺乏关于选择辅助变量的指导。我们研究了一个选择不当的辅助变量的后果：如果它与部分观察到的变量有共同的原因，以及它缺失的概率（即它是一个“对撞机”），那么它的包含可能会在MI估计量中引起偏差，并可能增加SE，当暴露或结果不完整时，偏差和SE的大小。当部分观察到实质性分析结果时，相对于暴露系数的大小，偏差可能是实质性的。在完整记录分析有效的设置中，当部分观察到暴露时，偏差较小。然而，如果结果也导致暴露缺失，则偏差可能会更大。在使用MI时，重要的是要通过数据探索和考虑看似合理的随机图和缺失机制来检查潜在的辅助变量是否是对撞机。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in epidemiology

自引率

0.00%

发文量