Clustering methods for single-cell RNA-sequencing expression data: performance evaluation with varying sample sizes and cell compositions

IF 0.4 4区数学 Q3 Mathematics Statistical Applications in Genetics and Molecular Biology Pub Date : 2019-08-14 DOI:10.1515/sagmb-2019-0004

A. Suner

{"title":"Clustering methods for single-cell RNA-sequencing expression data: performance evaluation with varying sample sizes and cell compositions","authors":"A. Suner","doi":"10.1515/sagmb-2019-0004","DOIUrl":null,"url":null,"abstract":"Abstract A number of specialized clustering methods have been developed so far for the accurate analysis of single-cell RNA-sequencing (scRNA-seq) expression data, and several reports have been published documenting the performance measures of these clustering methods under different conditions. However, to date, there are no available studies regarding the systematic evaluation of the performance measures of the clustering methods taking into consideration the sample size and cell composition of a given scRNA-seq dataset. Herein, a comprehensive performance evaluation study of 11 selected scRNA-seq clustering methods was performed using synthetic datasets with known sample sizes and number of subpopulations, as well as varying levels of transcriptome complexity. The results indicate that the overall performance of the clustering methods under study are highly dependent on the sample size and complexity of the scRNA-seq dataset. In most of the cases, better clustering performances were obtained as the number of cells in a given expression dataset was increased. The findings of this study also highlight the importance of sample size for the successful detection of rare cell subpopulations with an appropriate clustering tool.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":" ","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2019-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2019-0004","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2019-0004","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 4

Abstract

Abstract A number of specialized clustering methods have been developed so far for the accurate analysis of single-cell RNA-sequencing (scRNA-seq) expression data, and several reports have been published documenting the performance measures of these clustering methods under different conditions. However, to date, there are no available studies regarding the systematic evaluation of the performance measures of the clustering methods taking into consideration the sample size and cell composition of a given scRNA-seq dataset. Herein, a comprehensive performance evaluation study of 11 selected scRNA-seq clustering methods was performed using synthetic datasets with known sample sizes and number of subpopulations, as well as varying levels of transcriptome complexity. The results indicate that the overall performance of the clustering methods under study are highly dependent on the sample size and complexity of the scRNA-seq dataset. In most of the cases, better clustering performances were obtained as the number of cells in a given expression dataset was increased. The findings of this study also highlight the importance of sample size for the successful detection of rare cell subpopulations with an appropriate clustering tool.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

单细胞rna测序表达数据的聚类方法:不同样本量和细胞组成的性能评估

为了准确分析单细胞rna测序(scRNA-seq)表达数据，目前已经开发了许多专门的聚类方法，并发表了一些报告，记录了这些聚类方法在不同条件下的性能指标。然而，到目前为止，还没有关于考虑到给定scRNA-seq数据集的样本量和细胞组成的聚类方法性能指标的系统评估的研究。本文使用已知样本量和亚群数量以及不同转录组复杂性水平的合成数据集，对11种选定的scRNA-seq聚类方法进行了综合性能评估研究。结果表明，所研究的聚类方法的总体性能高度依赖于scRNA-seq数据集的样本量和复杂性。在大多数情况下，随着给定表达数据集中的细胞数量的增加，聚类性能会得到更好的提高。本研究的发现还强调了样本大小对于使用适当的聚类工具成功检测稀有细胞亚群的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Applications in Genetics and Molecular Biology 生物-生化与分子生物学

CiteScore

1.20

自引率

11.10%

发文量

审稿时长

6-12 weeks

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.