Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.

Cell systems Pub Date : 2024-09-18 Epub Date: 2024-09-06 DOI:10.1016/j.cels.2024.08.006
Ruoqiao Chen, Jiayu Zhou, Bin Chen
{"title":"Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.","authors":"Ruoqiao Chen, Jiayu Zhou, Bin Chen","doi":"10.1016/j.cels.2024.08.006","DOIUrl":null,"url":null,"abstract":"<p><p>Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"869-884.e6"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423933/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.08.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用上下文无关的零点深度集合,从单细胞转录组中推算出 2,500 多种表面蛋白质的丰度。
细胞表面蛋白是主要的药物靶标和细胞身份标记。CITE-seq(通过测序对转录组和表位进行细胞索引)等技术实现了对单个细胞内表面蛋白丰度和转录物表达的同时量化。已发表的数据被用来训练机器学习模型,以便仅从转录本表达预测表面蛋白丰度。然而,由于预测的蛋白质规模较小,而且这些计算方法在不同环境(如不同组织/疾病状态)下的泛化能力较差,这阻碍了它们的广泛应用。在这里,我们提出了 SPIDER(利用单细胞 RNA 测序的深度集合进行表面蛋白质预测),这是一种与上下文无关的零次深度集合模型,它能进行大规模蛋白质丰度预测,并能更好地泛化到各种上下文中。综合基准测试表明,SPIDER优于其他最先进的方法。通过预测单细胞转录组中超过2500个蛋白质的表面丰度,我们展示了SPIDER的广泛应用,包括肝癌和结直肠癌的细胞类型注释、生物标记物/靶标识别以及细胞-细胞相互作用分析。补充信息中包含了本文透明的同行评审过程记录。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Markov field network model of multi-modal data predicts effects of immune system perturbations on intravenous BCG vaccination in macaques. A three-node Turing gene circuit forms periodic spatial patterns in bacteria. Tracking the gene expression programs and clonal relationships that underlie mast, myeloid, and T lineage specification from stem cells. Optimized reporters for multiplexed detection of transcription factor activity. Classification and functional characterization of regulators of intracellular STING trafficking identified by genome-wide optical pooled screening.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1