用于单细胞 ATAC-seq 数据生成和分析的多功能信息扩散模型

Lei Huang, Lei Xiong, Na Sun, Zunpeng Liu, Ka-Chun Wong, Manolis Kellis
{"title":"用于单细胞 ATAC-seq 数据生成和分析的多功能信息扩散模型","authors":"Lei Huang, Lei Xiong, Na Sun, Zunpeng Liu, Ka-Chun Wong, Manolis Kellis","doi":"arxiv-2408.14801","DOIUrl":null,"url":null,"abstract":"The rapid advancement of single-cell ATAC sequencing (scATAC-seq)\ntechnologies holds great promise for investigating the heterogeneity of\nepigenetic landscapes at the cellular level. The amplification process in\nscATAC-seq experiments often introduces noise due to dropout events, which\nresults in extreme sparsity that hinders accurate analysis. Consequently, there\nis a significant demand for the generation of high-quality scATAC-seq data in\nsilico. Furthermore, current methodologies are typically task-specific, lacking\na versatile framework capable of handling multiple tasks within a single model.\nIn this work, we propose ATAC-Diff, a versatile framework, which is based on a\nlatent diffusion model conditioned on the latent auxiliary variables to adapt\nfor various tasks. ATAC-Diff is the first diffusion model for the scATAC-seq\ndata generation and analysis, composed of auxiliary modules encoding the latent\nhigh-level variables to enable the model to learn the semantic information to\nsample high-quality data. Gaussian Mixture Model (GMM) as the latent prior and\nauxiliary decoder, the yield variables reserve the refined genomic information\nbeneficial for downstream analyses. Another innovation is the incorporation of\nmutual information between observed and hidden variables as a regularization\nterm to prevent the model from decoupling from latent variables. Through\nextensive experiments, we demonstrate that ATAC-Diff achieves high performance\nin both generation and analysis tasks, outperforming state-of-the-art models.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A versatile informative diffusion model for single-cell ATAC-seq data generation and analysis\",\"authors\":\"Lei Huang, Lei Xiong, Na Sun, Zunpeng Liu, Ka-Chun Wong, Manolis Kellis\",\"doi\":\"arxiv-2408.14801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid advancement of single-cell ATAC sequencing (scATAC-seq)\\ntechnologies holds great promise for investigating the heterogeneity of\\nepigenetic landscapes at the cellular level. The amplification process in\\nscATAC-seq experiments often introduces noise due to dropout events, which\\nresults in extreme sparsity that hinders accurate analysis. Consequently, there\\nis a significant demand for the generation of high-quality scATAC-seq data in\\nsilico. Furthermore, current methodologies are typically task-specific, lacking\\na versatile framework capable of handling multiple tasks within a single model.\\nIn this work, we propose ATAC-Diff, a versatile framework, which is based on a\\nlatent diffusion model conditioned on the latent auxiliary variables to adapt\\nfor various tasks. ATAC-Diff is the first diffusion model for the scATAC-seq\\ndata generation and analysis, composed of auxiliary modules encoding the latent\\nhigh-level variables to enable the model to learn the semantic information to\\nsample high-quality data. Gaussian Mixture Model (GMM) as the latent prior and\\nauxiliary decoder, the yield variables reserve the refined genomic information\\nbeneficial for downstream analyses. Another innovation is the incorporation of\\nmutual information between observed and hidden variables as a regularization\\nterm to prevent the model from decoupling from latent variables. Through\\nextensive experiments, we demonstrate that ATAC-Diff achieves high performance\\nin both generation and analysis tasks, outperforming state-of-the-art models.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

单细胞ATAC测序(scATAC-seq)技术的迅速发展为研究细胞水平表观遗传景观的异质性带来了巨大的希望。scATAC-seq 实验的扩增过程往往会因丢弃事件而引入噪声,从而导致极度稀疏,阻碍了精确分析。因此,对在内部生成高质量的 scATAC-seq 数据有很大的需求。在这项工作中,我们提出了 ATAC-Diff,一个基于潜在辅助变量条件的潜在扩散模型的多功能框架,以适应各种任务。ATAC-Diff 是第一个用于 scATAC-seq 数据生成和分析的扩散模型,由编码潜在高层次变量的辅助模块组成,使模型能够学习语义信息,从而对高质量数据进行采样。高斯混杂模型(GMM)作为潜在先验和辅助解码器,产生的变量保留了精炼的基因组信息,有利于下游分析。另一项创新是将观测变量和隐藏变量之间的相互信息作为正则化项,以防止模型与潜在变量脱钩。通过大量的实验,我们证明 ATAC-Diff 在生成和分析任务中都取得了很高的性能,超过了最先进的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A versatile informative diffusion model for single-cell ATAC-seq data generation and analysis
The rapid advancement of single-cell ATAC sequencing (scATAC-seq) technologies holds great promise for investigating the heterogeneity of epigenetic landscapes at the cellular level. The amplification process in scATAC-seq experiments often introduces noise due to dropout events, which results in extreme sparsity that hinders accurate analysis. Consequently, there is a significant demand for the generation of high-quality scATAC-seq data in silico. Furthermore, current methodologies are typically task-specific, lacking a versatile framework capable of handling multiple tasks within a single model. In this work, we propose ATAC-Diff, a versatile framework, which is based on a latent diffusion model conditioned on the latent auxiliary variables to adapt for various tasks. ATAC-Diff is the first diffusion model for the scATAC-seq data generation and analysis, composed of auxiliary modules encoding the latent high-level variables to enable the model to learn the semantic information to sample high-quality data. Gaussian Mixture Model (GMM) as the latent prior and auxiliary decoder, the yield variables reserve the refined genomic information beneficial for downstream analyses. Another innovation is the incorporation of mutual information between observed and hidden variables as a regularization term to prevent the model from decoupling from latent variables. Through extensive experiments, we demonstrate that ATAC-Diff achieves high performance in both generation and analysis tasks, outperforming state-of-the-art models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Allium Vegetables Intake and Digestive System Cancer Risk: A Study Based on Mendelian Randomization, Network Pharmacology and Molecular Docking wgatools: an ultrafast toolkit for manipulating whole genome alignments Selecting Differential Splicing Methods: Practical Considerations Advancements in colored k-mer sets: essentials for the curious Advancements in practical k-mer sets: essentials for the curious
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1