AI-driven synthetic data generation for accelerating hepatology research: A study of the United Network for Organ Sharing (UNOS) database

IF 15.8 1区 医学 Q1 GASTROENTEROLOGY & HEPATOLOGY Hepatology Pub Date : 2025-03-11 DOI:10.1097/hep.0000000000001299
Joseph C. Ahn, Yung-Kyun Noh, Mingzhao Hu, Xiaotong Shen, Douglas A. Simonetto, Patrick S. Kamath, Rohit Loomba, Vijay H. Shah
{"title":"AI-driven synthetic data generation for accelerating hepatology research: A study of the United Network for Organ Sharing (UNOS) database","authors":"Joseph C. Ahn, Yung-Kyun Noh, Mingzhao Hu, Xiaotong Shen, Douglas A. Simonetto, Patrick S. Kamath, Rohit Loomba, Vijay H. Shah","doi":"10.1097/hep.0000000000001299","DOIUrl":null,"url":null,"abstract":"Background and Aims: Clinical hepatology research often faces limited data availability, underrepresentation of minority groups, and complex data-sharing regulations. Synthetic data—artificially generated patient records designed to mirror real-world distributions— offers a potential solution. We hypothesized that diffusion models, a state-of-the-art generative technique, could produce synthetic liver transplant waitlist data from the United Network for Organ Sharing (UNOS) database that maintains statistical fidelity, replicates clinical correlations and survival patterns, and ensures robust privacy protection. Methods: Diffusion models were used to generate synthetic patient cohorts mirroring the UNOS liver transplant waitlist database between years 2019 and 2023. Statistical fidelity was assessed using Maximum Mean Discrepancy (MMD) and Wasserstein distance, correlation analysis, and variable-level metrics. Clinical utility was evaluated by comparing transplant-free survival via Kaplan-Meier curves and the MELD score performance. Privacy was quantified using the Distance to Closest Record (DCR) and attribute disclosure risk assessments. Results: The synthetic dataset was nearly indistinguishable from the original dataset (MMD=0.002, standardized Wasserstein distance<1.0), preserving clinically relevant correlations and survival patterns as evidenced by similar median survival times (110 vs. 101 days) and 5-year survival rates (22.2% vs. 22.8%). MELD-based 90-day mortality prediction was maintained (original AUC=0.839 vs. synthetic AUC=0.844). Privacy metrics indicated no identifiable patient matches, and mean DCR values ensured that synthetic individuals were not direct replicas of real patients. Conclusion: AI-generated synthetic data derived from diffusion models can faithfully replicate complex hepatology datasets, maintain key clinical signals, and ensure strong privacy safeguards. This approach can help address data scarcity, enhance model generalizability, foster multi-institutional collaboration, and accelerate progress in hepatology research.","PeriodicalId":177,"journal":{"name":"Hepatology","volume":"39 1","pages":""},"PeriodicalIF":15.8000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hepatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/hep.0000000000001299","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background and Aims: Clinical hepatology research often faces limited data availability, underrepresentation of minority groups, and complex data-sharing regulations. Synthetic data—artificially generated patient records designed to mirror real-world distributions— offers a potential solution. We hypothesized that diffusion models, a state-of-the-art generative technique, could produce synthetic liver transplant waitlist data from the United Network for Organ Sharing (UNOS) database that maintains statistical fidelity, replicates clinical correlations and survival patterns, and ensures robust privacy protection. Methods: Diffusion models were used to generate synthetic patient cohorts mirroring the UNOS liver transplant waitlist database between years 2019 and 2023. Statistical fidelity was assessed using Maximum Mean Discrepancy (MMD) and Wasserstein distance, correlation analysis, and variable-level metrics. Clinical utility was evaluated by comparing transplant-free survival via Kaplan-Meier curves and the MELD score performance. Privacy was quantified using the Distance to Closest Record (DCR) and attribute disclosure risk assessments. Results: The synthetic dataset was nearly indistinguishable from the original dataset (MMD=0.002, standardized Wasserstein distance<1.0), preserving clinically relevant correlations and survival patterns as evidenced by similar median survival times (110 vs. 101 days) and 5-year survival rates (22.2% vs. 22.8%). MELD-based 90-day mortality prediction was maintained (original AUC=0.839 vs. synthetic AUC=0.844). Privacy metrics indicated no identifiable patient matches, and mean DCR values ensured that synthetic individuals were not direct replicas of real patients. Conclusion: AI-generated synthetic data derived from diffusion models can faithfully replicate complex hepatology datasets, maintain key clinical signals, and ensure strong privacy safeguards. This approach can help address data scarcity, enhance model generalizability, foster multi-institutional collaboration, and accelerate progress in hepatology research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
加速肝病学研究的人工智能驱动合成数据生成:联合器官共享网络(UNOS)数据库的研究
背景和目的:临床肝病学研究经常面临数据可用性有限、少数群体代表性不足和复杂的数据共享法规。合成数据——人为生成的病人记录,旨在反映现实世界的分布——提供了一个潜在的解决方案。我们假设扩散模型,一种最先进的生成技术,可以从联合器官共享网络(UNOS)数据库中生成合成的肝移植等待名单数据,保持统计保真度,复制临床相关性和生存模式,并确保强大的隐私保护。方法:采用扩散模型生成与2019年至2023年UNOS肝移植等待名单数据库相对应的合成患者队列。采用最大平均差异(MMD)和Wasserstein距离、相关分析和变量水平指标评估统计保真度。通过Kaplan-Meier曲线和MELD评分来比较无移植生存期,评估临床效用。使用最近记录距离(DCR)和属性披露风险评估对隐私进行量化。结果:合成数据集与原始数据集几乎无法区分(MMD=0.002,标准化Wasserstein距离<;1.0),保留了相似的中位生存时间(110天vs. 101天)和5年生存率(22.2% vs. 22.8%)所证明的临床相关性和生存模式。基于meld的90天死亡率预测维持不变(原始AUC=0.839 vs.合成AUC=0.844)。隐私指标表明没有可识别的患者匹配,平均DCR值确保合成个体不是真实患者的直接复制品。结论:人工智能生成的来自扩散模型的合成数据可以忠实地复制复杂的肝病数据集,维护关键的临床信号,并确保强有力的隐私保护。这种方法可以帮助解决数据短缺问题,增强模型的通用性,促进多机构合作,并加速肝病学研究的进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Hepatology
Hepatology 医学-胃肠肝病学
CiteScore
27.50
自引率
3.70%
发文量
609
审稿时长
1 months
期刊介绍: HEPATOLOGY is recognized as the leading publication in the field of liver disease. It features original, peer-reviewed articles covering various aspects of liver structure, function, and disease. The journal's distinguished Editorial Board carefully selects the best articles each month, focusing on topics including immunology, chronic hepatitis, viral hepatitis, cirrhosis, genetic and metabolic liver diseases, liver cancer, and drug metabolism.
期刊最新文献
A narrative review of lifestyle management guidelines for metabolic dysfunction-associated steatotic liver disease. Letter to the Editor: The treatment options for de novo perihilar cholangiocarcinoma require more details. Reply: The treatment options for de novo perihilar cholangiocarcinoma require more details. Erratum: Burden of fatty liver and hepatic fibrosis in persons with HIV: A diverse cross-sectional US multicenter study. Retraction: Capn4 overexpression underlies tumor invasion and metastasis after liver transplantation for hepatocellular carcinoma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1