CHAI: consensus clustering through similarity matrix integration for cell-type identification.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2024-07-25 DOI:10.1093/bib/bbae411

Musaddiq K Lodi, Muzammil Lodi, Kezie Osei, Vaishnavi Ranganathan, Priscilla Hwang, Preetam Ghosh

{"title":"CHAI: consensus clustering through similarity matrix integration for cell-type identification.","authors":"Musaddiq K Lodi, Muzammil Lodi, Kezie Osei, Vaishnavi Ranganathan, Priscilla Hwang, Preetam Ghosh","doi":"10.1093/bib/bbae411","DOIUrl":null,"url":null,"abstract":"<p><p>Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11359802/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae411","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell-type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state-of-the-art clustering methods: CHAI-AvgSim and CHAI-SNF. CHAI-AvgSim and CHAI-SNF demonstrate superior performance across several benchmarking datasets. Furthermore, both CHAI methods outperform the most recent consensus clustering method, SAME-clustering. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI overcomes previous limitations by incorporating the most recent and top performing scRNAseq clustering algorithms into the aggregation framework. It is also an intuitive and easily customizable R package where users may add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. This ensures that as more advanced clustering algorithms are developed, CHAI will remain useful to the community as a generalized framework. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CHAI：通过相似性矩阵整合进行共识聚类，用于细胞类型鉴定。

目前已开发出多种方法，用于计算预测单细胞 RNA 测序（scRNAseq）数据的细胞类型。随着方法的开发，研究人员面临的一个共同问题是如何确定适用于其特定用途的最佳方法。为解决这一难题，我们提出了一种用于 scRNAseq 聚类的众智方法--CHAI（通过相似矩阵整合进行单细胞类型鉴定的共识聚类）。CHAI 提出了两种相互竞争的方法，它们汇总了七种最先进聚类方法的聚类结果：CHAI-AvgSim 和 CHAI-SNF。在多个基准数据集上，CHAI-AvgSim 和 CHAI-SNF 都表现出卓越的性能。此外，两种 CHAI 方法的性能均优于最新的共识聚类方法 SAME-clustering。我们通过识别一个富含 CDH3 的领袖肿瘤细胞群，展示了 CHAI 的实际应用案例。CHAI 为多组学整合提供了一个平台，我们证明了 CHAI-SNF 在包含空间转录组学数据时性能的提高。CHAI 克服了以往的局限性，将最新、性能最好的 scRNAseq 聚类算法纳入聚合框架。它还是一个直观且易于定制的 R 软件包，用户可将自己的聚类方法添加到管道中，或向下选择他们想用于聚类聚合的方法。这确保了在开发出更先进的聚类算法时，CHAI 仍能作为一个通用框架为社区提供帮助。CHAI 在 GitHub 上以开源 R 包的形式提供：https://github.com/lodimk2/chai。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.