Analyzing missingness patterns in real-world data using the SMDI toolkit: application to a linked EHR-claims pharmacoepidemiology study.

IF 3.9 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES BMC Medical Research Methodology Pub Date : 2024-10-19 DOI:10.1186/s12874-024-02330-2
Sudha R Raman, Bradley G Hammill, Pamela A Shaw, Hana Lee, Sengwee Toh, John G Connolly, Kimberly J Dandreo, Vinit Nalawade, Fang Tian, Wei Liu, Jie Li, José J Hernández-Muñoz, Robert J Glynn, Rishi J Desai, Janick Weberpals
{"title":"Analyzing missingness patterns in real-world data using the SMDI toolkit: application to a linked EHR-claims pharmacoepidemiology study.","authors":"Sudha R Raman, Bradley G Hammill, Pamela A Shaw, Hana Lee, Sengwee Toh, John G Connolly, Kimberly J Dandreo, Vinit Nalawade, Fang Tian, Wei Liu, Jie Li, José J Hernández-Muñoz, Robert J Glynn, Rishi J Desai, Janick Weberpals","doi":"10.1186/s12874-024-02330-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example.</p><p><strong>Methods: </strong>We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015-2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications.</p><p><strong>Results: </strong>High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies.</p><p><strong>Conclusions: </strong>We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"246"},"PeriodicalIF":3.9000,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11490010/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02330-2","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example.

Methods: We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015-2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications.

Results: High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies.

Conclusions: We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用 SMDI 工具包分析真实世界数据中的遗漏模式:应用于一项关联的电子病历-索赔药物流行病学研究。
背景:在使用真实世界数据(包括电子健康记录(EHR))生成证据时,混杂变量的缺失数据是一个经常遇到的挑战。我们的目标是应用最近发布的工具包来描述缺失数据模式,并根据工具包关于可能缺失机制的结果,在一个实证案例中说明分析的决策过程:我们利用结构性缺失数据调查(SMDI)工具包描述了一项药物流行病学研究中的缺失数据模式,该研究比较了老年人开始服用钠-葡萄糖转运体-2 抑制剂(SGLT2i)和二肽基肽酶-4 抑制剂(DPP-4i)对心血管的影响。该研究使用了杜克健康患者的电子病历-医疗保险报销数据集(2015-2017 年),侧重于从电子病历数据(HbA1c 实验室和体重指数 [BMI] 值)中观察到的部分混杂因素。我们的分析结合了 SMDI 的描述性功能和诊断测试,以探索遗漏模式并确定遗漏缓解方法。我们利用这些调查结果来估算比较两类药物的调整后危险比:结果:包括 HbA1c(63.6%)和 BMI(16.5%)在内的重要混杂变量的漏诊率很高。诊断测试的结果描述了以下内容1)患者特征、暴露和结果在有或没有观察到部分观察到的协变量值的患者之间的分布;2)根据观察到的协变量预测遗漏的能力;3)估计部分观察到的协变量的遗漏是否与结果有关。有证据表明,使用观测数据可以充分描述缺失率,这就允许在估计治疗效果时使用随机森林的链式方程进行多重估算,以解决混杂因素数据缺失的问题。多重估算提高了效果估计值与以往研究的一致性:我们展示了 SMDI 工具包在现实世界中的实际应用。SMDI工具包的应用以及由此产生的对潜在缺失模式的认识,可以为选择适当的分析方法提供信息,并提高研究方法在处理缺失数据方面的透明度。这种方法可以为分析决策提供信息,并提高我们从真实世界数据中生成证据的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
期刊最新文献
The role of the estimand framework in the analysis of patient-reported outcomes in single-arm trials: a case study in oncology. Cardinality matching versus propensity score matching for addressing cluster-level residual confounding in implantable medical device and surgical epidemiology: a parametric and plasmode simulation study. Establishing a machine learning dementia progression prediction model with multiple integrated data. Correction: Forced randomization: the what, why, and how. Three new methodologies for calculating the effective sample size when performing population adjustment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1