The urgent need to accelerate synthetic data privacy frameworks for medical research

IF 23.8 1区医学 Q1 MEDICAL INFORMATICS Lancet Digital Health Pub Date : 2025-02-01 DOI:10.1016/S2589-7500(24)00196-1

Anmol Arora MBBChir MA , Siegfried Karl Wagner PhD FRCOphth , Robin Carpenter BSc , Rajesh Jena MD , Pearse A Keane MD FRCOphth

{"title":"The urgent need to accelerate synthetic data privacy frameworks for medical research","authors":"Anmol Arora MBBChir MA , Siegfried Karl Wagner PhD FRCOphth , Robin Carpenter BSc , Rajesh Jena MD , Pearse A Keane MD FRCOphth","doi":"10.1016/S2589-7500(24)00196-1","DOIUrl":null,"url":null,"abstract":"<div><div>Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 2","pages":"Pages e157-e160"},"PeriodicalIF":23.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001961","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

迫切需要加快医学研究的合成数据隐私框架。

通过生成式对抗网络和潜在扩散模型等人工智能技术生成的合成数据，可以在不暴露个人身份的情况下，保持这些技术所训练的真实数据中存在的总体模式和关系，从而降低重新识别风险。由于这种方法能够保护隐私并实现组织间的数据集共享，因此在生物医学研究中越来越受到重视。虽然合成数据的使用在金融和高能物理等其他领域已经非常普遍，但在医学研究中的使用却带来了新的问题。使用合成数据作为保护用于训练模型的数据隐私的一种方法，要求数据与原始数据具有高保真性，以保持实用性，但必须有足够的差异，以防止对抗性或意外的重新识别。有必要制定合成数据生成标准和评估标准。随着合成数据应用的不断扩大，持续的法律和伦理评估对于确保合成数据在不损害个人隐私的情况下继续成为推进医学研究的安全有效工具至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Lancet Digital Health Multiple-

CiteScore

41.20

自引率

1.60%

发文量

232

审稿时长

13 weeks

期刊介绍： The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.