Anmol Arora MBBChir MA , Siegfried Karl Wagner PhD FRCOphth , Robin Carpenter BSc , Rajesh Jena MD , Pearse A Keane MD FRCOphth
{"title":"The urgent need to accelerate synthetic data privacy frameworks for medical research","authors":"Anmol Arora MBBChir MA , Siegfried Karl Wagner PhD FRCOphth , Robin Carpenter BSc , Rajesh Jena MD , Pearse A Keane MD FRCOphth","doi":"10.1016/S2589-7500(24)00196-1","DOIUrl":null,"url":null,"abstract":"<div><div>Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":"7 2","pages":"Pages e157-e160"},"PeriodicalIF":23.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001961","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Synthetic data, generated through artificial intelligence technologies such as generative adversarial networks and latent diffusion models, maintain aggregate patterns and relationships present in the real data the technologies were trained on without exposing individual identities, thereby mitigating re-identification risks. This approach has been gaining traction in biomedical research because of its ability to preserve privacy and enable dataset sharing between organisations. Although the use of synthetic data has become widespread in other domains, such as finance and high-energy physics, use in medical research raises novel issues. The use of synthetic data as a method of preserving the privacy of data used to train models requires that the data are high fidelity with the original data to preserve utility, but must be sufficiently different as to protect against adversarial or accidental re-identification. There is a need for the development of standards for synthetic data generation and consensus standards for its evaluation. As synthetic data applications expand, ongoing legal and ethical evaluations are crucial to ensure that they remain a secure and effective tool for advancing medical research without compromising individual privacy.
期刊介绍:
The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health.
The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health.
We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.