scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling

Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei
{"title":"scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling","authors":"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei","doi":"arxiv-2404.06153","DOIUrl":null,"url":null,"abstract":"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\ntechnology extensively utilized in biological research, facilitating the\nexamination of gene expression at the individual cell level within a given\ntissue sample. While numerous tools have been developed for scRNA-seq data\nanalysis, the challenge persists in capturing the distinct features of such\ndata and replicating virtual datasets that share analogous statistical\nproperties. Results: Our study introduces a generative approach termed\nscRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\nscRNA-seq data by leveraging a real dataset. The method is a neural network\nconstructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\nDiffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\nreal dataset through iterative noise-adding steps and ultimately restoring the\nnoises to form scRNA-seq samples. This scheme allows us to learn data features\nfrom actual scRNA-seq samples during model training. Our experiments, conducted\non two distinct scRNA-seq datasets, demonstrate superior performance.\nAdditionally, the model sampling process is expedited by incorporating\nDenoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\nmethodology empowering users to train neural network models with their unique\nscRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\nsamples. Availability and implementation: https://github.com/DongShengze/scRDiT","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.06153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
scRDiT:通过扩散变换器和加速采样生成单细胞 RNA-seq 数据
动机单细胞 RNA 测序(scRNA-seq)是生物研究中广泛应用的一项突破性技术,有助于研究给定组织样本中单个细胞水平的基因表达。虽然已经开发出许多用于 scRNA-seq 数据分析的工具,但在捕捉此类数据的独特特征和复制具有类似统计属性的虚拟数据集方面仍存在挑战。结果:我们的研究引入了一种称为 scRNA-seq Diffusion Transformer(scRDiT)的生成方法。该方法利用真实数据集生成虚拟 scRNA-seq 数据。该方法是基于去噪扩散概率模型(DDPM)和扩散变换器(DiT)构建的神经网络。这包括通过迭代噪声添加步骤对原始数据集进行高斯噪声处理,并最终恢复噪声以形成 scRNA-seq 样本。这种方案使我们能够在模型训练期间从实际的 scRNA-seq 样本中学习数据特征。我们在两个不同的 scRNA-seq 数据集上进行的实验证明了其卓越的性能。此外,通过结合噪声扩散隐含模型(DDIM),我们加快了模型采样过程。scRDiT 提出了一种统一的方法论,使用户能够利用其独特的 scRNA-seq 数据集训练神经网络模型,从而生成大量高质量的 scRNA-seq 样本。可用性和实施:https://github.com/DongShengze/scRDiT
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking wgatools: an ultrafast toolkit for manipulating whole genome alignments Selecting Differential Splicing Methods: Practical Considerations Advancements in colored k-mer sets: essentials for the curious Advancements in practical k-mer sets: essentials for the curious
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1