SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients.

Jason H Moore, Xi Li, Jui-Hsuan Chang, Nicholas P Tatonetti, Dan Theodorescu, Yong Chen, Folkert W Asselbergs, Mythreye Venkatesan, Zhiping Paul Wang
{"title":"SynTwin: A graph-based approach for predicting clinical outcomes using digital twins derived from synthetic patients.","authors":"Jason H Moore, Xi Li, Jui-Hsuan Chang, Nicholas P Tatonetti, Dan Theodorescu, Yong Chen, Folkert W Asselbergs, Mythreye Venkatesan, Zhiping Paul Wang","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10827004/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

The concept of a digital twin came from the engineering, industrial, and manufacturing domains to create virtual objects or machines that could inform the design and development of real objects. This idea is appealing for precision medicine where digital twins of patients could help inform healthcare decisions. We have developed a methodology for generating and using digital twins for clinical outcome prediction. We introduce a new approach that combines synthetic data and network science to create digital twins (i.e. SynTwin) for precision medicine. First, our approach starts by estimating the distance between all subjects based on their available features. Second, the distances are used to construct a network with subjects as nodes and edges defining distance less than the percolation threshold. Third, communities or cliques of subjects are defined. Fourth, a large population of synthetic patients are generated using a synthetic data generation algorithm that models the correlation structure of the data to generate new patients. Fifth, digital twins are selected from the synthetic patient population that are within a given distance defining a subject community in the network. Finally, we compare and contrast community-based prediction of clinical endpoints using real subjects, digital twins, or both within and outside of the community. Key to this approach are the digital twins defined using patient similarity that represent hypothetical unobserved patients with patterns similar to nearby real patients as defined by network distance and community structure. We apply our SynTwin approach to predicting mortality in a population-based cancer registry (n=87,674) from the Surveillance, Epidemiology, and End Results (SEER) program from the National Cancer Institute (USA). Our results demonstrate that nearest network neighbor prediction of mortality in this study is significantly improved with digital twins (AUROC=0.864, 95% CI=0.857-0.872) over just using real data alone (AUROC=0.791, 95% CI=0.781-0.800). These results suggest a network-based digital twin strategy using synthetic patients may add value to precision medicine efforts.

分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SynTwin:一种基于图谱的方法,利用从合成患者中提取的数字双胞胎预测临床结果。
数字孪生的概念来自工程、工业和制造领域,旨在创建虚拟物体或机器,为真实物体的设计和开发提供参考。这一想法对精准医疗很有吸引力,患者的数字孪生可以帮助医疗决策提供依据。我们开发了一种生成和使用数字双胞胎进行临床结果预测的方法。我们介绍了一种结合合成数据和网络科学的新方法,为精准医疗创建数字孪生(即 SynTwin)。首先,我们的方法是根据所有受试者的可用特征来估计他们之间的距离。其次,利用这些距离构建一个网络,以受试者为节点,边缘定义的距离小于渗透阈值。第三,定义受试者的群落或小群。第四,使用合成数据生成算法生成大量合成患者,该算法可模拟数据的相关结构,生成新的患者。第五,从合成患者群体中挑选出一定距离内的数字双胞胎,定义网络中的主体群落。最后,我们使用真实受试者、数字双胞胎或社区内外的受试者对基于社区的临床终点预测进行比较和对比。这种方法的关键在于使用患者相似性定义的数字孪生,它代表了假设的未观察到的患者,其模式与网络距离和社区结构定义的附近真实患者相似。我们将 SynTwin 方法应用于预测美国国家癌症研究所(National Cancer Institute,USA)监测、流行病学和最终结果(Surveillance,Epidemiology,and End Results,SEER)计划中基于人群的癌症登记(n=87,674)中的死亡率。我们的研究结果表明,在这项研究中,使用数字孪生(AUROC=0.864,95% CI=0.857-0.872)对死亡率进行最近网络邻接预测,比只使用真实数据(AUROC=0.791,95% CI=0.781-0.800)有显著提高。这些结果表明,使用合成患者的基于网络的数字孪生策略可能会为精准医疗工作增添价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.50
自引率
0.00%
发文量
0
期刊最新文献
FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis. Generating new drug repurposing hypotheses using disease-specific hypergraphs. Impact of Measurement Noise on Genetic Association Studies of Cardiac Function. Imputation of race and ethnicity categories using genetic ancestry from real-world genomic testing data. intCC: An efficient weighted integrative consensus clustering of multimodal data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1