{"title":"Practical strategies for handling breakdown of multiple imputation procedures.","authors":"Cattram D Nguyen, John B Carlin, Katherine J Lee","doi":"10.1186/s12982-021-00095-3","DOIUrl":null,"url":null,"abstract":"<p><p>Multiple imputation is a recommended method for handling incomplete data problems. One of the barriers to its successful use is the breakdown of the multiple imputation procedure, often due to numerical problems with the algorithms used within the imputation process. These problems frequently occur when imputation models contain large numbers of variables, especially with the popular approach of multivariate imputation by chained equations. This paper describes common causes of failure of the imputation procedure including perfect prediction and collinearity, focusing on issues when using Stata software. We outline a number of strategies for addressing these issues, including imputation of composite variables instead of individual components, introducing prior information and changing the form of the imputation model. These strategies are illustrated using a case study based on data from the Longitudinal Study of Australian Children.</p>","PeriodicalId":39896,"journal":{"name":"Emerging Themes in Epidemiology","volume":"18 1","pages":"5"},"PeriodicalIF":3.6000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8017730/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging Themes in Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12982-021-00095-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Multiple imputation is a recommended method for handling incomplete data problems. One of the barriers to its successful use is the breakdown of the multiple imputation procedure, often due to numerical problems with the algorithms used within the imputation process. These problems frequently occur when imputation models contain large numbers of variables, especially with the popular approach of multivariate imputation by chained equations. This paper describes common causes of failure of the imputation procedure including perfect prediction and collinearity, focusing on issues when using Stata software. We outline a number of strategies for addressing these issues, including imputation of composite variables instead of individual components, introducing prior information and changing the form of the imputation model. These strategies are illustrated using a case study based on data from the Longitudinal Study of Australian Children.
多重估算是处理不完整数据问题的一种推荐方法。成功使用该方法的障碍之一是多重估算程序的崩溃,这通常是由于估算过程中使用的算法出现了数值问题。当估算模型包含大量变量时,尤其是采用链式方程进行多元估算的流行方法时,这些问题就会经常出现。本文介绍了导致估算程序失败的常见原因,包括完全预测和共线性,重点讨论了使用 Stata 软件时出现的问题。我们概述了解决这些问题的一系列策略,包括归因综合变量而非单个成分、引入先验信息以及改变归因模型的形式。我们将使用基于澳大利亚儿童纵向研究数据的案例研究来说明这些策略。
期刊介绍:
Emerging Themes in Epidemiology is an open access, peer-reviewed, online journal that aims to promote debate and discussion on practical and theoretical aspects of epidemiology. Combining statistical approaches with an understanding of the biology of disease, epidemiologists seek to elucidate the social, environmental and host factors related to adverse health outcomes. Although research findings from epidemiologic studies abound in traditional public health journals, little publication space is devoted to discussion of the practical and theoretical concepts that underpin them. Because of its immediate impact on public health, an openly accessible forum is needed in the field of epidemiology to foster such discussion.