Multiple Imputation for Longitudinal Data: A Tutorial.

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Statistics in Medicine Pub Date : 2025-02-10 DOI:10.1002/sim.10274

Rushani Wijesuriya, Margarita Moreno-Betancur, John B Carlin, Ian R White, Matteo Quartagno, Katherine J Lee

{"title":"Multiple Imputation for Longitudinal Data: A Tutorial.","authors":"Rushani Wijesuriya, Margarita Moreno-Betancur, John B Carlin, Ian R White, Matteo Quartagno, Katherine J Lee","doi":"10.1002/sim.10274","DOIUrl":null,"url":null,"abstract":"<p><p>Longitudinal studies are frequently used in medical research and involve collecting repeated measures on individuals over time. Observations from the same individual are invariably correlated and thus an analytic approach that accounts for this clustering by individual is required. While almost all research suffers from missing data, this can be particularly problematic in longitudinal studies as participation often becomes harder to maintain over time. Multiple imputation (MI) is widely used to handle missing data in such studies. When using MI, it is important that the imputation model is compatible with the proposed analysis model. In a longitudinal analysis, this implies that the clustering considered in the analysis model should be reflected in the imputation process. Several MI approaches have been proposed to impute incomplete longitudinal data, such as treating repeated measurements of the same variable as distinct variables or using generalized linear mixed imputation models. However, the uptake of these methods has been limited, as they require additional data manipulation and use of advanced imputation procedures. In this tutorial, we review the available MI approaches that can be used for handling incomplete longitudinal data, including where individuals are clustered within higher-level clusters. We illustrate implementation with replicable R and Stata code using a case study from the Childhood to Adolescence Transition Study.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 3-4","pages":"e10274"},"PeriodicalIF":1.8000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755704/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.10274","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Longitudinal studies are frequently used in medical research and involve collecting repeated measures on individuals over time. Observations from the same individual are invariably correlated and thus an analytic approach that accounts for this clustering by individual is required. While almost all research suffers from missing data, this can be particularly problematic in longitudinal studies as participation often becomes harder to maintain over time. Multiple imputation (MI) is widely used to handle missing data in such studies. When using MI, it is important that the imputation model is compatible with the proposed analysis model. In a longitudinal analysis, this implies that the clustering considered in the analysis model should be reflected in the imputation process. Several MI approaches have been proposed to impute incomplete longitudinal data, such as treating repeated measurements of the same variable as distinct variables or using generalized linear mixed imputation models. However, the uptake of these methods has been limited, as they require additional data manipulation and use of advanced imputation procedures. In this tutorial, we review the available MI approaches that can be used for handling incomplete longitudinal data, including where individuals are clustered within higher-level clusters. We illustrate implementation with replicable R and Stata code using a case study from the Childhood to Adolescence Transition Study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

纵向数据的多重插值：教程。

纵向研究经常用于医学研究，涉及在一段时间内收集对个体的重复测量。来自同一个体的观察总是相互关联的，因此需要一种分析方法来解释这种个体聚类。虽然几乎所有的研究都存在数据缺失的问题，但这在纵向研究中尤其成问题，因为随着时间的推移，参与往往变得更难维持。在这类研究中，多次插值（Multiple imputation， MI）被广泛用于处理缺失数据。在使用MI时，重要的是输入模型与建议的分析模型兼容。在纵向分析中，这意味着分析模型中考虑的聚类应该反映在imputation过程中。已经提出了几种MI方法来推算不完整的纵向数据，例如将同一变量的重复测量作为不同的变量或使用广义线性混合推算模型。然而，这些方法的采用受到限制，因为它们需要额外的数据操作和使用先进的计算程序。在本教程中，我们将回顾可用的MI方法，这些方法可用于处理不完整的纵向数据，包括将个体聚集在更高级别的集群中。我们用一个从儿童到青少年过渡研究的案例研究来说明可复制的R和Stata代码的实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生

CiteScore

3.40

自引率

10.00%

发文量

334

审稿时长

2-4 weeks

期刊介绍： The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.