Scaling deep identifiable models enables zero-shot characterization of single-cell biological states.

Mingze Dong, Kriti Agrawal, Rong Fan, Esen Sefik, Richard A Flavell, Yuval Kluger
{"title":"Scaling deep identifiable models enables zero-shot characterization of single-cell biological states.","authors":"Mingze Dong, Kriti Agrawal, Rong Fan, Esen Sefik, Richard A Flavell, Yuval Kluger","doi":"10.1101/2023.11.11.566161","DOIUrl":null,"url":null,"abstract":"<p><p>How to identify true biological differences across samples while overcoming batch effects has been a persistent challenge in single-cell RNA-seq data analysis, hindering analyses across datasets for transferable biological findings. In this work, we show that scaling up deep identifiable models leads to a surprisingly effective solution for this challenging task. We developed scShift, a deep variational inference framework with theoretical support in disentangling batch-dependent and independent variations. By training the model with compendiums of scRNA-seq atlases, scShift shows remarkable <b>zero-shot</b> capabilities in revealing representations of cell types and biological states in single-cell data while overcoming batch effects. We employed scShift to systematically compare lung fibrosis states across different datasets, tissues and experimental systems. scShift uniquely extrapolates lung fibrosis states to previously unseen post-COVID-19 fibrosis, characterizing universal myeloid-fibrosis signatures, potential repurposing drug targets and fibrosis-associated cell interactions. Evaluations of over 200 trained scShift models demonstrate emergent zero-shot capabilities and a scaling law beyond a transition threshold, with respect to dataset diversity. With its scaling performance on massive single-cell compendiums and exceptional zero-shot capabilities, scShift represents an important advance toward next-generation computational models for single-cell analysis.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680588/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.11.11.566161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

How to identify true biological differences across samples while overcoming batch effects has been a persistent challenge in single-cell RNA-seq data analysis, hindering analyses across datasets for transferable biological findings. In this work, we show that scaling up deep identifiable models leads to a surprisingly effective solution for this challenging task. We developed scShift, a deep variational inference framework with theoretical support in disentangling batch-dependent and independent variations. By training the model with compendiums of scRNA-seq atlases, scShift shows remarkable zero-shot capabilities in revealing representations of cell types and biological states in single-cell data while overcoming batch effects. We employed scShift to systematically compare lung fibrosis states across different datasets, tissues and experimental systems. scShift uniquely extrapolates lung fibrosis states to previously unseen post-COVID-19 fibrosis, characterizing universal myeloid-fibrosis signatures, potential repurposing drug targets and fibrosis-associated cell interactions. Evaluations of over 200 trained scShift models demonstrate emergent zero-shot capabilities and a scaling law beyond a transition threshold, with respect to dataset diversity. With its scaling performance on massive single-cell compendiums and exceptional zero-shot capabilities, scShift represents an important advance toward next-generation computational models for single-cell analysis.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
单细胞地图集的深度可识别建模使细胞状态的零射击查询成为可能。
随着在图谱水平上出现的单细胞RNA-seq数据集,建立在现有图谱上的通用模型的潜力仍然不清楚,该模型可以推断出新的数据。这种模型的一个基本但具有挑战性的问题是以零射击的方式识别潜在的生物和批变化,这对于表征具有新生物状态的scRNA-seq数据集至关重要。在这项工作中,我们提出了scShift,这是一个机制模型,可以从atlas级别的scRNA-seq数据以及扰动scRNA-seq数据中学习批量和生物模式。scShift将基因模型作为潜在生物过程的功能,利用因果表征学习的最新进展,通过批效应和生物扰动诱导的稀疏转移。通过对真实数据集的基准测试,我们发现scShift揭示了统一的细胞类型表示以及零射击方式查询数据的潜在生物变化,优于广泛使用的图谱集成,批量校正和微扰建模方法。scShift能够将基因表达谱映射到扰动标签,并预测衰竭T细胞的有意义靶标以及CellxGene血液图谱中的一系列疾病。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Self-supervised segmentation and characterization of fiber bundles in anatomic tracing data. Single neuron contributions to the auditory brainstem EEG. Neural substrates of cold nociception in Drosophila larva. Inversions Can Accumulate Balanced Sexual Antagonism: Evidence from Simulations and Drosophila Experiments. Programming megakaryocytes to produce engineered platelets for delivering non-native proteins.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1