Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.

IF 3.9 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES BMC Medical Research Methodology Pub Date : 2024-09-04 DOI:10.1186/s12874-024-02302-6

Rheanna M Mainzer, Margarita Moreno-Betancur, Cattram D Nguyen, Julie A Simpson, John B Carlin, Katherine J Lee

{"title":"Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions.","authors":"Rheanna M Mainzer, Margarita Moreno-Betancur, Cattram D Nguyen, Julie A Simpson, John B Carlin, Katherine J Lee","doi":"10.1186/s12874-024-02302-6","DOIUrl":null,"url":null,"abstract":"Background: Missing data are common in observational studies and often occur in several of the variables required when estimating a causal effect, i.e. the exposure, outcome and/or variables used to control for confounding. Analyses involving multiple incomplete variables are not as straightforward as analyses with a single incomplete variable. For example, in the context of multivariable missingness, the standard missing data assumptions (\"missing completely at random\", \"missing at random\" [MAR], \"missing not at random\") are difficult to interpret and assess. It is not clear how the complexities that arise due to multivariable missingness are being addressed in practice. The aim of this study was to review how missing data are managed and reported in observational studies that use multiple imputation (MI) for causal effect estimation, with a particular focus on missing data summaries, missing data assumptions, primary and sensitivity analyses, and MI implementation.Methods: We searched five top general epidemiology journals for observational studies that aimed to answer a causal research question and used MI, published between January 2019 and December 2021. Article screening and data extraction were performed systematically.Results: Of the 130 studies included in this review, 108 (83%) derived an analysis sample by excluding individuals with missing data in specific variables (e.g., outcome) and 114 (88%) had multivariable missingness within the analysis sample. Forty-four (34%) studies provided a statement about missing data assumptions, 35 of which stated the MAR assumption, but only 11/44 (25%) studies provided a justification for these assumptions. The number of imputations, MI method and MI software were generally well-reported (71%, 75% and 88% of studies, respectively), while aspects of the imputation model specification were not clear for more than half of the studies. A secondary analysis that used a different approach to handle the missing data was conducted in 69/130 (53%) studies. Of these 69 studies, 68 (99%) lacked a clear justification for the secondary analysis.Conclusion: Effort is needed to clarify the rationale for and improve the reporting of MI for estimation of causal effects from observational data. We encourage greater transparency in making and reporting analytical decisions related to missing data.","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":null,"pages":null},"PeriodicalIF":3.9000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373423/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02302-6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Missing data are common in observational studies and often occur in several of the variables required when estimating a causal effect, i.e. the exposure, outcome and/or variables used to control for confounding. Analyses involving multiple incomplete variables are not as straightforward as analyses with a single incomplete variable. For example, in the context of multivariable missingness, the standard missing data assumptions ("missing completely at random", "missing at random" [MAR], "missing not at random") are difficult to interpret and assess. It is not clear how the complexities that arise due to multivariable missingness are being addressed in practice. The aim of this study was to review how missing data are managed and reported in observational studies that use multiple imputation (MI) for causal effect estimation, with a particular focus on missing data summaries, missing data assumptions, primary and sensitivity analyses, and MI implementation.

Methods: We searched five top general epidemiology journals for observational studies that aimed to answer a causal research question and used MI, published between January 2019 and December 2021. Article screening and data extraction were performed systematically.

Results: Of the 130 studies included in this review, 108 (83%) derived an analysis sample by excluding individuals with missing data in specific variables (e.g., outcome) and 114 (88%) had multivariable missingness within the analysis sample. Forty-four (34%) studies provided a statement about missing data assumptions, 35 of which stated the MAR assumption, but only 11/44 (25%) studies provided a justification for these assumptions. The number of imputations, MI method and MI software were generally well-reported (71%, 75% and 88% of studies, respectively), while aspects of the imputation model specification were not clear for more than half of the studies. A secondary analysis that used a different approach to handle the missing data was conducted in 69/130 (53%) studies. Of these 69 studies, 68 (99%) lacked a clear justification for the secondary analysis.

Conclusion: Effort is needed to clarify the rationale for and improve the reporting of MI for estimation of causal effects from observational data. We encourage greater transparency in making and reporting analytical decisions related to missing data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不完整数据多重估算的使用和报告中存在的差距：针对因果问题的观察性研究的范围界定研究结果。

背景：缺失数据在观察性研究中很常见，在估算因果效应时，缺失数据往往出现在几个必要的变量中，即暴露、结果和/或用于控制混杂因素的变量。涉及多个不完整变量的分析不像单个不完整变量的分析那么简单。例如，在多变量缺失的情况下，标准的缺失数据假设（"完全随机缺失"、"随机缺失"[MAR]、"非随机缺失"）很难解释和评估。目前还不清楚在实践中是如何解决多变量缺失带来的复杂问题的。本研究旨在回顾使用多重归因（MI）进行因果效应估计的观察性研究中如何管理和报告缺失数据，尤其关注缺失数据摘要、缺失数据假设、主要分析和敏感性分析以及 MI 的实施：我们在五种顶级普通流行病学期刊上检索了2019年1月至2021年12月间发表的旨在回答因果研究问题并使用MI的观察性研究。我们系统地进行了文章筛选和数据提取：在纳入本综述的 130 项研究中，108 项（83%）通过排除特定变量（如结果）数据缺失的个体获得了分析样本，114 项（88%）在分析样本中存在多变量缺失。有 44 项（34%）研究提供了有关缺失数据假设的声明，其中 35 项声明了 MAR 假设，但只有 11/44 项（25%）研究提供了这些假设的理由。对估算次数、MI 方法和 MI 软件的报告普遍较好（分别占研究的 71%、75% 和 88%），但半数以上的研究对估算模型规范的某些方面并不清楚。69/130（53%）项研究采用了不同的方法对缺失数据进行了二次分析。在这 69 项研究中，有 68 项（99%）的二次分析缺乏明确的理由：结论：需要努力澄清对观察数据的因果效应进行估计的 MI 的理由并改进其报告。我们鼓励在做出和报告与缺失数据相关的分析决定时提高透明度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC Medical Research Methodology 医学-卫生保健

CiteScore

6.50

自引率

2.50%

发文量

298

审稿时长

3-8 weeks

期刊介绍： BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.