Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable

IF 1.3 Q3 STATISTICS & PROBABILITY Journal of Probability and Statistics Pub Date : 2021-05-17 DOI:10.1155/2021/6668822

Sara Javadi, A. Bahrampour, M. M. Saber, B. Garrusi, M. Baneshi

{"title":"Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable","authors":"Sara Javadi, A. Bahrampour, M. M. Saber, B. Garrusi, M. Baneshi","doi":"10.1155/2021/6668822","DOIUrl":null,"url":null,"abstract":"Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.","PeriodicalId":44760,"journal":{"name":"Journal of Probability and Statistics","volume":"2021 1","pages":"1-14"},"PeriodicalIF":1.3000,"publicationDate":"2021-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2021/6668822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 7

Abstract

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在虚拟变量与连续变量相互作用的情况下，处理缺失二值结果数据的四种多重输入方法的评价

链式方程多重插补（MICE）是插补缺失数据最常用的方法。在MICE算法中，可以使用各种参数和非参数方法进行插补。MICE实施中的默认设置是，插补模型仅将变量作为线性项包含，没有交互作用，但忽略交互作用项可能会导致有偏差的结果。使用模拟和真实数据集，研究递归划分是否在具有适当置信区间的输入和无偏参数估计之间产生适当的可变性。我们在真实数据集和模拟数据集上比较了四种多重插补（MI）方法。MI方法包括在MICE中使用与插补模型中的交互项的预测均值匹配（MICE交互），用于指定MICE中插补模型的分类和回归树（CART）（MICE-CART），在MICE（MICE-RF）中实施随机森林（RF），以及MICE分层方法。我们首先选择了次要数据，并设计了一个由40个场景组成的实验设计（2 × 5. × 4），不同之处在于模拟缺失数据的比率（10%、20%、30%、40%和50%）、缺失机制（MAR和MCAR）和插补方法（MICE交互、MICE-CART、MICE-RF和MICE分层）。首先，我们随机抽取700个观测值，替换300次，然后创建缺失的数据。评估基于原始偏差（RB）以及在重复中平均的其他五个测量值。接下来，在一项模拟研究中，我们生成了1000次数据，样本量为700。然后，我们为每个数据集创建一次缺失的数据。对于所有场景，使用与真实数据相同的标准来评估模拟研究中方法的性能。得出的结论是，当假人和连续预测器之间存在交互效应时，与参数方法相比，使用递归划分进行插补可以获得显著的收益，而且MICE交互方法总是比其他方法更有效、更方便地保持交互效应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Probability and Statistics STATISTICS & PROBABILITY-

自引率

0.00%

发文量

审稿时长

18 weeks