The urgent need to accelerate synthetic data privacy frameworks for medical research

IF 23.8 1区 医学 Q1 MEDICAL INFORMATICS Lancet Digital Health Pub Date : 2025-02-01 DOI:10.1016/S2589-7500(24)00196-1
Anmol Arora MBBChir MA , Siegfried Karl Wagner PhD FRCOphth , Robin Carpenter BSc , Rajesh Jena MD , Pearse A Keane MD FRCOphth
{"title":"The urgent need to accelerate synthetic data privacy frameworks for medical research","authors":"Anmol Arora MBBChir MA ,&nbsp;Siegfried Karl Wagner PhD FRCOphth ,&nbsp;Robin Carpenter BSc ,&nbsp;Rajesh Jena MD ,&nbsp;Pearse A Keane MD FRCOphth","doi":"10.1016/S2589-7500(24)00196-1","DOIUrl":null,"url":null,"abstract":"<div><div>Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 2","pages":"Pages e157-e160"},"PeriodicalIF":23.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001961","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迫切需要加快医学研究的合成数据隐私框架。
通过生成式对抗网络和潜在扩散模型等人工智能技术生成的合成数据,可以在不暴露个人身份的情况下,保持这些技术所训练的真实数据中存在的总体模式和关系,从而降低重新识别风险。由于这种方法能够保护隐私并实现组织间的数据集共享,因此在生物医学研究中越来越受到重视。虽然合成数据的使用在金融和高能物理等其他领域已经非常普遍,但在医学研究中的使用却带来了新的问题。使用合成数据作为保护用于训练模型的数据隐私的一种方法,要求数据与原始数据具有高保真性,以保持实用性,但必须有足够的差异,以防止对抗性或意外的重新识别。有必要制定合成数据生成标准和评估标准。随着合成数据应用的不断扩大,持续的法律和伦理评估对于确保合成数据在不损害个人隐私的情况下继续成为推进医学研究的安全有效工具至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
41.20
自引率
1.60%
发文量
232
审稿时长
13 weeks
期刊介绍: The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.
期刊最新文献
The urgent need to accelerate synthetic data privacy frameworks for medical research From the 100 Day Mission to 100 lines of software development: how to improve early outbreak analytics Prediction of emergency admissions: trade-offs between model simplicity and performance AI for medical diagnosis: does a single negative trial mean it is ineffective? Artificial intelligence-guided detection of under-recognised cardiomyopathies on point-of-care cardiac ultrasonography: a multicentre study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1