Artificial variables help to avoid over-clustering in single-cell RNA sequencing.

IF 8.1 1区 生物学 Q1 GENETICS & HEREDITY American journal of human genetics Pub Date : 2025-04-03 Epub Date: 2025-03-12 DOI:10.1016/j.ajhg.2025.02.014
Alan DenAdel, Michelle L Ramseier, Andrew W Navia, Alex K Shalek, Srivatsan Raghavan, Peter S Winter, Ava P Amini, Lorin Crawford
{"title":"Artificial variables help to avoid over-clustering in single-cell RNA sequencing.","authors":"Alan DenAdel, Michelle L Ramseier, Andrew W Navia, Alex K Shalek, Srivatsan Raghavan, Peter S Winter, Ava P Amini, Lorin Crawford","doi":"10.1016/j.ajhg.2025.02.014","DOIUrl":null,"url":null,"abstract":"<p><p>Standard single-cell RNA sequencing (scRNA-seq) pipelines nearly always include unsupervised clustering as a key step in identifying biologically distinct cell types. A follow-up step in these pipelines is to test for differential expression between the identified clusters. When algorithms over-cluster, downstream analyses can produce misleading results. In this work, we present \"recall\" (calibrated clustering with artificial variables), a method for protecting against over-clustering by controlling for the impact of reusing the same data twice when performing differential expression analysis, commonly known as \"double dipping.\" Importantly, our approach can be applied to a wide range of clustering algorithms. Using real and simulated data, we show that recall provides state-of-the-art clustering performance and can rapidly analyze large-scale scRNA-seq studies, even on a personal laptop.</p>","PeriodicalId":7659,"journal":{"name":"American journal of human genetics","volume":" ","pages":"940-951"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12081238/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ajhg.2025.02.014","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Standard single-cell RNA sequencing (scRNA-seq) pipelines nearly always include unsupervised clustering as a key step in identifying biologically distinct cell types. A follow-up step in these pipelines is to test for differential expression between the identified clusters. When algorithms over-cluster, downstream analyses can produce misleading results. In this work, we present "recall" (calibrated clustering with artificial variables), a method for protecting against over-clustering by controlling for the impact of reusing the same data twice when performing differential expression analysis, commonly known as "double dipping." Importantly, our approach can be applied to a wide range of clustering algorithms. Using real and simulated data, we show that recall provides state-of-the-art clustering performance and can rapidly analyze large-scale scRNA-seq studies, even on a personal laptop.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工变量有助于避免单细胞 RNA 测序中的过度聚类。
标准单细胞RNA测序(scRNA-seq)管道几乎总是包括无监督聚类作为鉴定生物学上不同细胞类型的关键步骤。这些管道的后续步骤是测试已识别集群之间的差异表达。当算法过度聚类时,下游分析可能产生误导性的结果。在这项工作中,我们提出了“召回”(使用人工变量校准聚类),这是一种在执行差分表达分析时通过控制重复使用相同数据两次的影响来防止过度聚类的方法,通常称为“双浸”。重要的是,我们的方法可以应用于广泛的聚类算法。通过使用真实和模拟数据,我们表明recall提供了最先进的聚类性能,并且可以快速分析大规模scRNA-seq研究,甚至在个人笔记本电脑上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
14.70
自引率
4.10%
发文量
185
审稿时长
1 months
期刊介绍: The American Journal of Human Genetics (AJHG) is a monthly journal published by Cell Press, chosen by The American Society of Human Genetics (ASHG) as its premier publication starting from January 2008. AJHG represents Cell Press's first society-owned journal, and both ASHG and Cell Press anticipate significant synergies between AJHG content and that of other Cell Press titles.
期刊最新文献
Implications of the FDA's new plausible mechanism framework for the development of a personalized in vivo prime editing platform. Ultra-rare functional variants reveal early-onset breast cancer risk genes and pathways in the UK Biobank and All of Us Research Program. Bi-allelic variants in NDUFA5 cause a mitochondriopathy with complex I deficiency. Specifications of the ACMG/AMP variant curation guidelines for the analysis of germline ATM sequence variants. Bi-allelic ATG12 variants impair autophagy and cause a neurodevelopmental disorder.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1