Validation Assessment of Privacy-Preserving Synthetic Electronic Health Record Data: Comparison of Original Versus Synthetic Data on Real-World COVID-19 Vaccine Effectiveness.

IF 2.4 4区 医学 Q3 PHARMACOLOGY & PHARMACY Pharmacoepidemiology and Drug Safety Pub Date : 2024-10-01 DOI:10.1002/pds.70019
Echo Wang, Katrina Mott, Hongtao Zhang, Sivan Gazit, Gabriel Chodick, Mehmet Burcu
{"title":"Validation Assessment of Privacy-Preserving Synthetic Electronic Health Record Data: Comparison of Original Versus Synthetic Data on Real-World COVID-19 Vaccine Effectiveness.","authors":"Echo Wang, Katrina Mott, Hongtao Zhang, Sivan Gazit, Gabriel Chodick, Mehmet Burcu","doi":"10.1002/pds.70019","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To assess the validity of privacy-preserving synthetic data by comparing results from synthetic versus original EHR data analysis.</p><p><strong>Methods: </strong>A published retrospective cohort study on real-world effectiveness of COVID-19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID-19 infection, symptomatic COVID-19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results.</p><p><strong>Results: </strong>The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%-99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID-19 Infection. In the comparison of hazard ratios for COVID 19-related hospitalization and odds ratio for symptomatic COVID-19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates.</p><p><strong>Conclusions: </strong>Overall, comparison of synthetic versus original real-world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted.</p>","PeriodicalId":19782,"journal":{"name":"Pharmacoepidemiology and Drug Safety","volume":"33 10","pages":"e70019"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacoepidemiology and Drug Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/pds.70019","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To assess the validity of privacy-preserving synthetic data by comparing results from synthetic versus original EHR data analysis.

Methods: A published retrospective cohort study on real-world effectiveness of COVID-19 vaccines by Maccabi Healthcare Services in Israel was replicated using synthetic data generated from the same source, and the results were compared between synthetic versus original datasets. The endpoints included COVID-19 infection, symptomatic COVID-19 infection and hospitalization due to infection and were also assessed in several demographic and clinical subgroups. In comparing synthetic versus original data estimates, several metrices were utilized: standardized mean differences (SMD), decision agreement, estimate agreement, confidence interval overlap, and Wald test. Synthetic data were generated five times to assess the stability of results.

Results: The distribution of demographic and clinical characteristics demonstrated very small difference (< 0.01 SMD). In the comparison of vaccine effectiveness assessed in relative risk reduction between synthetic versus original data, there was a 100% decision agreement, 100% estimate agreement, and a high level of confidence interval overlap (88.7%-99.7%) in all five replicates across all subgroups. Similar findings were achieved in the assessment of vaccine effectiveness against symptomatic COVID-19 Infection. In the comparison of hazard ratios for COVID 19-related hospitalization and odds ratio for symptomatic COVID-19 Infection, the Wald tests suggested no significant difference between respective effect estimates in all five replicates for all patient subgroups but there were disagreements in estimate and decision metrices in some subgroups and replicates.

Conclusions: Overall, comparison of synthetic versus original real-world data demonstrated good validity and reliability. Transparency on the process to generate high fidelity synthetic data and assurances of patient privacy are warranted.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
隐私保护合成电子健康记录数据的验证评估:原始数据与合成数据对真实世界 COVID-19 疫苗有效性的比较。
目的:通过比较合成与原始电子病历数据分析的结果,评估保护隐私的合成数据的有效性:方法:使用同一来源的合成数据复制了以色列马卡比医疗保健服务公司发表的一项关于 COVID-19 疫苗实际效果的回顾性队列研究,并比较了合成数据集与原始数据集的结果。终点包括 COVID-19 感染、无症状 COVID-19 感染和感染导致的住院治疗,并对几个人口和临床亚组进行了评估。在比较合成数据与原始数据估计值时,使用了几种度量方法:标准化均值差异(SMD)、决策一致、估计值一致、置信区间重叠和 Wald 检验。合成数据共生成五次,以评估结果的稳定性:结果:人口统计学和临床特征的分布显示出非常小的差异:总体而言,合成数据与原始真实世界数据的比较显示出良好的有效性和可靠性。生成高保真合成数据的过程应该透明,并保证患者的隐私。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.80
自引率
7.70%
发文量
173
审稿时长
3 months
期刊介绍: The aim of Pharmacoepidemiology and Drug Safety is to provide an international forum for the communication and evaluation of data, methods and opinion in the discipline of pharmacoepidemiology. The Journal publishes peer-reviewed reports of original research, invited reviews and a variety of guest editorials and commentaries embracing scientific, medical, statistical, legal and economic aspects of pharmacoepidemiology and post-marketing surveillance of drug safety. Appropriate material in these categories may also be considered for publication as a Brief Report. Particular areas of interest include: design, analysis, results, and interpretation of studies looking at the benefit or safety of specific pharmaceuticals, biologics, or medical devices, including studies in pharmacovigilance, postmarketing surveillance, pharmacoeconomics, patient safety, molecular pharmacoepidemiology, or any other study within the broad field of pharmacoepidemiology; comparative effectiveness research relating to pharmaceuticals, biologics, and medical devices. Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition, as these methods are truly used in the real world; methodologic contributions of relevance to pharmacoepidemiology, whether original contributions, reviews of existing methods, or tutorials for how to apply the methods of pharmacoepidemiology; assessments of harm versus benefit in drug therapy; patterns of drug utilization; relationships between pharmacoepidemiology and the formulation and interpretation of regulatory guidelines; evaluations of risk management plans and programmes relating to pharmaceuticals, biologics and medical devices.
期刊最新文献
Core Concepts in Pharmacoepidemiology: Time-To-Event Analysis Approaches in Pharmacoepidemiology. Hydrochlorothiazide Use and Risk of Skin Cancer: A Population-Based Retrospective Cohort Study. The Oncology QCARD Initiative: Fostering efficient evaluation of initial real-world data proposals. Validation Study of the Claims-Based Algorithm Using the International Classification of Diseases Codes to Identify Patients With Coronavirus Disease in Japan From 2020 to 2022: The VENUS Study. A Validated Algorithm to Identify Hepatic Decompensation in the Veterans Health Administration Electronic Health Record System.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1