Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.

IF 26.8 1区 医学 Q1 ENGINEERING, BIOMEDICAL Nature Biomedical Engineering Pub Date : 2024-03-21 DOI:10.1038/s41551-024-01193-8
Francisco Carrillo-Perez, Marija Pizurica, Yuanning Zheng, Tarak Nath Nandi, Ravi Madduri, Jeanne Shen, Olivier Gevaert
{"title":"Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.","authors":"Francisco Carrillo-Perez, Marija Pizurica, Yuanning Zheng, Tarak Nath Nandi, Ravi Madduri, Jeanne Shen, Olivier Gevaert","doi":"10.1038/s41551-024-01193-8","DOIUrl":null,"url":null,"abstract":"<p><p>Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.</p>","PeriodicalId":19063,"journal":{"name":"Nature Biomedical Engineering","volume":null,"pages":null},"PeriodicalIF":26.8000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1038/s41551-024-01193-8","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过级联扩散模型从 RNA 序列数据生成合成的肿瘤全切片图像。
当获取多样化和足够大的数据集既昂贵又具有挑战性时,使用合成生成的数据训练机器学习模型可以缓解数据稀缺的问题。在这里,我们展示了级联扩散模型可用于从来自人类肿瘤的 RNA 序列数据的潜在表征中合成逼真的整张幻灯片图像。基因表达的改变影响了合成图像瓦片中细胞类型的组成,而合成图像瓦片准确地保留了细胞类型的分布,并保持了大量 RNA 序列数据中观察到的细胞比例,我们展示了肺腺癌、肾乳头状细胞癌、宫颈鳞状细胞癌、结肠腺癌和胶质母细胞瘤的情况。使用生成的合成数据预训练的机器学习模型比从头开始训练的模型表现更好。合成数据可加快在数据稀缺的环境中开发机器学习模型的速度,并允许对缺失数据模式进行估算。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nature Biomedical Engineering
Nature Biomedical Engineering Medicine-Medicine (miscellaneous)
CiteScore
45.30
自引率
1.10%
发文量
138
期刊介绍: Nature Biomedical Engineering is an online-only monthly journal that was launched in January 2017. It aims to publish original research, reviews, and commentary focusing on applied biomedicine and health technology. The journal targets a diverse audience, including life scientists who are involved in developing experimental or computational systems and methods to enhance our understanding of human physiology. It also covers biomedical researchers and engineers who are engaged in designing or optimizing therapies, assays, devices, or procedures for diagnosing or treating diseases. Additionally, clinicians, who make use of research outputs to evaluate patient health or administer therapy in various clinical settings and healthcare contexts, are also part of the target audience.
期刊最新文献
Endocisternal interfaces for minimally invasive neural stimulation and recording of the brain and spinal cord A stealthy neural recorder for the study of behaviour in primates Large DNA deletions occur during DNA repair at 20-fold lower frequency for base editors and prime editors than for Cas9 nucleases Spatially resolved subcellular protein–protein interactomics in drug-perturbed lung-cancer cultures and tissues Ultrabright and ultrafast afterglow imaging in vivo via nanoparticles made of trianthracene derivatives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1