Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias

Frontiers in epidemiology Pub Date : 2023-09-15 DOI:10.3389/fepid.2023.1237447

Elinor Curnow, Kate Tilling, Jon E. Heron, Rosie P. Cornish, James R. Carpenter

{"title":"Multiple imputation of missing data under missing at random: including a collider as an auxiliary variable in the imputation model can induce bias","authors":"Elinor Curnow, Kate Tilling, Jon E. Heron, Rosie P. Cornish, James R. Carpenter","doi":"10.3389/fepid.2023.1237447","DOIUrl":null,"url":null,"abstract":"Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.","PeriodicalId":73083,"journal":{"name":"Frontiers in epidemiology","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fepid.2023.1237447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). In MI, in addition to those required for the substantive analysis, imputation models often include other variables (“auxiliary variables”). Auxiliary variables that predict the partially observed variables can reduce the standard error (SE) of the MI estimator and, if they also predict the probability that data are missing, reduce bias due to data being missing not at random. However, guidance for choosing auxiliary variables is lacking. We examine the consequences of a poorly chosen auxiliary variable: if it shares a common cause with the partially observed variable and the probability that it is missing (i.e., it is a “collider”), its inclusion can induce bias in the MI estimator and may increase the SE. We quantify, both algebraically and by simulation, the magnitude of bias and SE when either the exposure or outcome is incomplete. When the substantive analysis outcome is partially observed, the bias can be substantial, relative to the magnitude of the exposure coefficient. In settings in which a complete records analysis is valid, the bias is smaller when the exposure is partially observed. However, bias can be larger if the outcome also causes missingness in the exposure. When using MI, it is important to examine, through a combination of data exploration and considering plausible casual diagrams and missingness mechanisms, whether potential auxiliary variables are colliders.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

随机缺失情况下缺失数据的多次补全:在补全模型中加入对撞机作为辅助变量会产生偏差

流行病学研究往往有缺失的数据，这通常是由多重imputation (MI)处理。在人工智能中，除了实体分析所需的变量外，归算模型通常还包括其他变量(“辅助变量”)。预测部分观察到的变量的辅助变量可以减少MI估计器的标准误差(SE)，如果它们还预测数据丢失的概率，则可以减少由于数据非随机丢失而导致的偏差。然而，辅助变量的选择缺乏指导。我们研究了一个选择不当的辅助变量的后果:如果它与部分观察到的变量有共同的原因，并且它缺失的概率(即，它是一个“碰撞器”)，它的包含可以在MI估计器中引起偏差，并可能增加SE。我们量化，代数和模拟，偏差和SE的大小，当暴露或结果是不完整的。当实质性分析结果被部分观察到时，相对于暴露系数的大小，偏差可能是实质性的。在完整记录分析有效的情况下，当曝光被部分观察到时，偏差较小。然而，如果结果也导致暴露缺失，则偏差可能更大。在使用MI时，重要的是要通过结合数据探索和考虑合理的随机图和缺失机制来检查潜在的辅助变量是否为碰撞器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers in epidemiology

自引率

0.00%

发文量