Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations.

IF 7 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Annual Review of Biomedical Data Science Pub Date : 2025-01-14 DOI:10.1146/annurev-biodatasci-103123-094844
Kara Liu, Russ B Altman
{"title":"Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations.","authors":"Kara Liu, Russ B Altman","doi":"10.1146/annurev-biodatasci-103123-094844","DOIUrl":null,"url":null,"abstract":"<p><p>Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured clinical trial data, are rich sources of information with the potential to advance precision medicine and optimize patient care. However, real-world medical datasets have limited patient diversity and cannot simulate hypothetical outcomes, both of which are necessary for equitable and effective medical research. Fueled by recent advancements in machine learning, generative models offer a promising solution to these data limitations by generating enhanced synthetic data. This review highlights the potential of conditional generative models (CGMs) to create patient-specific synthetic data for a variety of precision medicine applications. We survey CGM approaches that tackle two medical applications: correcting for data representation biases and simulating digital health twins. We additionally explore how the surveyed methods handle modeling tabular medical data and briefly discuss evaluation criteria. Finally, we summarize the technical, medical, and ethical challenges that must be addressed before CGMs can be effectively and safely deployed in the medical field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":7.0000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Biomedical Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1146/annurev-biodatasci-103123-094844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured clinical trial data, are rich sources of information with the potential to advance precision medicine and optimize patient care. However, real-world medical datasets have limited patient diversity and cannot simulate hypothetical outcomes, both of which are necessary for equitable and effective medical research. Fueled by recent advancements in machine learning, generative models offer a promising solution to these data limitations by generating enhanced synthetic data. This review highlights the potential of conditional generative models (CGMs) to create patient-specific synthetic data for a variety of precision medicine applications. We survey CGM approaches that tackle two medical applications: correcting for data representation biases and simulating digital health twins. We additionally explore how the surveyed methods handle modeling tabular medical data and briefly discuss evaluation criteria. Finally, we summarize the technical, medical, and ethical challenges that must be addressed before CGMs can be effectively and safely deployed in the medical field.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
合成表格数据的条件生成模型:精准医疗和多样化表示的应用。
表格式医疗数据集,如电子健康记录(EHRs)、生物银行和结构化临床试验数据,是丰富的信息源,具有推进精准医疗和优化患者护理的潜力。然而,现实世界的医疗数据集具有有限的患者多样性,无法模拟假设的结果,这两者对于公平和有效的医学研究都是必要的。在机器学习最新进展的推动下,生成模型通过生成增强的合成数据,为这些数据限制提供了一个有希望的解决方案。这篇综述强调了条件生成模型(cgm)在为各种精准医学应用创建患者特定合成数据方面的潜力。我们调查了CGM解决两种医疗应用的方法:纠正数据表示偏差和模拟数字健康双胞胎。此外,我们还探讨了调查方法如何处理表格医学数据的建模,并简要讨论了评估标准。最后,我们总结了在cgm能够有效和安全地应用于医疗领域之前必须解决的技术、医学和伦理挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
11.10
自引率
1.70%
发文量
0
期刊介绍: The Annual Review of Biomedical Data Science provides comprehensive expert reviews in biomedical data science, focusing on advanced methods to store, retrieve, analyze, and organize biomedical data and knowledge. The scope of the journal encompasses informatics, computational, artificial intelligence (AI), and statistical approaches to biomedical data, including the sub-fields of bioinformatics, computational biology, biomedical informatics, clinical and clinical research informatics, biostatistics, and imaging informatics. The mission of the journal is to identify both emerging and established areas of biomedical data science, and the leaders in these fields.
期刊最新文献
Genetic Studies Through the Lens of Gene Networks. Evaluation and Regulation of Artificial Intelligence Medical Devices for Clinical Decision Support. Foundation Models for Translational Cancer Biology. Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations. Spatial Transcriptomics Brings New Challenges and Opportunities for Trajectory Inference.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1