Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics.

IF 4.4 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS European heart journal. Digital health Pub Date : 2024-11-20 eCollection Date: 2025-01-01 DOI:10.1093/ehjdh/ztae083
Tim I Johann, Karen Otte, Fabian Prasser, Christoph Dieterich
{"title":"Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics.","authors":"Tim I Johann, Karen Otte, Fabian Prasser, Christoph Dieterich","doi":"10.1093/ehjdh/ztae083","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Data availability remains a critical challenge in modern, data-driven medical research. Due to the sensitive nature of patient health records, they are rightfully subject to stringent privacy protection measures. One way to overcome these restrictions is to preserve patient privacy by using anonymization and synthetization strategies. In this work, we investigate the effectiveness of these methods for protecting patient privacy using real-world cardiology health records.</p><p><strong>Methods and results: </strong>We implemented anonymization and synthetization techniques for a structure data set, which was collected during the HiGHmed Use Case Cardiology study. We employed the data anonymization tool ARX and the data synthetization framework ASyH individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats.</p><p><strong>Conclusion: </strong>We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We share all generated data sets with the scientific community through a use and access agreement.</p>","PeriodicalId":72965,"journal":{"name":"European heart journal. Digital health","volume":"6 1","pages":"147-154"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11750188/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European heart journal. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ehjdh/ztae083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Aims: Data availability remains a critical challenge in modern, data-driven medical research. Due to the sensitive nature of patient health records, they are rightfully subject to stringent privacy protection measures. One way to overcome these restrictions is to preserve patient privacy by using anonymization and synthetization strategies. In this work, we investigate the effectiveness of these methods for protecting patient privacy using real-world cardiology health records.

Methods and results: We implemented anonymization and synthetization techniques for a structure data set, which was collected during the HiGHmed Use Case Cardiology study. We employed the data anonymization tool ARX and the data synthetization framework ASyH individually and in combination. We evaluated the utility and shortcomings of the different approaches by statistical analyses and privacy risk assessments. Data utility was assessed by computing two heart failure risk scores on the protected data sets. We observed only minimal deviations to scores from the original data set. Additionally, we performed a re-identification risk analysis and found only minor residual risks for common types of privacy threats.

Conclusion: We could demonstrate that anonymization and synthetization methods protect privacy while retaining data utility for heart failure risk assessment. Both approaches and a combination thereof introduce only minimal deviations from the original data set over all features. While data synthesis techniques produce any number of new records, data anonymization techniques offer more formal privacy guarantees. Consequently, data synthesis on anonymized data further enhances privacy protection with little impacting data utility. We share all generated data sets with the scientific community through a use and access agreement.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
匿名还是合成?心力衰竭评分分析的隐私保护方法。
目的:在现代数据驱动的医学研究中,数据可用性仍然是一个关键挑战。由于患者健康记录的敏感性,它们理应受到严格的隐私保护措施的约束。克服这些限制的一种方法是通过使用匿名化和合成策略来保护患者的隐私。在这项工作中,我们调查了这些方法的有效性,以保护患者隐私使用现实世界的心脏病健康记录。方法和结果:我们对在HiGHmed用例心脏病学研究期间收集的结构数据集实施了匿名化和合成技术。我们单独或组合使用了数据匿名化工具ARX和数据综合框架ASyH。我们通过统计分析和隐私风险评估来评估不同方法的效用和缺点。通过计算受保护数据集上的两个心力衰竭风险评分来评估数据效用。我们只观察到与原始数据集分数的最小偏差。此外,我们执行了重新识别风险分析,发现对于常见类型的隐私威胁仅存在较小的剩余风险。结论:我们可以证明匿名化和合成方法在保护隐私的同时保留了心力衰竭风险评估的数据效用。这两种方法及其组合在所有特征上只引入与原始数据集的最小偏差。虽然数据合成技术产生了任意数量的新记录,但数据匿名化技术提供了更正式的隐私保证。因此,对匿名数据的数据合成在不影响数据效用的情况下进一步增强了隐私保护。我们通过使用和访问协议与科学界共享所有生成的数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.00
自引率
0.00%
发文量
0
期刊最新文献
Signal or noise? Evaluating commonly used attribution methods for explaining deep neural networks in electrocardiogram classification. Deep learning-based multi-view echocardiographic framework for comprehensive diagnosis of pericardial disease. Artificial intelligence-derived electrocardiographic age gap as a predictor of mortality after coronary revascularization: prognostic value and short-term intra-patient variability. Fully automated, deep learning, cardiac CT-based multimodal network for cardiovascular risk stratification in high-risk perioperative patients. An artificial intelligence prediction model for optimizing patient selection for cardiac imaging for the investigation of suspected coronary artery disease.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1