Ultrafast clustering of single-cell flow cytometry data using FlowGrid.

Q1 Mathematics BMC Systems Biology Pub Date : 2019-04-05 DOI:10.1186/s12918-019-0690-2
Xiaoxin Ye, Joshua W K Ho
{"title":"Ultrafast clustering of single-cell flow cytometry data using FlowGrid.","authors":"Xiaoxin Ye,&nbsp;Joshua W K Ho","doi":"10.1186/s12918-019-0690-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells.</p><p><strong>Results: </strong>Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error.</p><p><strong>Conclusions: </strong>FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid .</p>","PeriodicalId":9013,"journal":{"name":"BMC Systems Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12918-019-0690-2","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12918-019-0690-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 25

Abstract

Background: Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells.

Results: Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error.

Conclusions: FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid .

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用FlowGrid的单细胞流式细胞术数据的超快速聚类。
背景:流式细胞术是一种流行的单细胞表面标记物定量分析技术。它能够在数百万个单细胞中测量数十种细胞表面蛋白标记物的表达。它是发现细胞亚群和定量细胞群异质性的有力工具。传统上,科学家们使用人工门控来识别细胞类型,但这一过程是主观的,并且对大型多维数据不有效。已经开发了许多聚类算法来分析这些数据,但其中大多数都不能扩展到具有超过1000万个单元格的非常大的数据集。结果:本文提出了一种新的聚类算法,它结合了基于密度的聚类算法DBSCAN的优点和基于网格的聚类的可扩展性。这种新的聚类算法是在python中作为一个开源包FlowGrid实现的。FlowGrid具有内存效率,并且相对于单元格的数量呈线性扩展。我们已经将FlowGrid的性能与其他最先进的聚类程序进行了比较,发现FlowGrid产生了类似的聚类结果,但花费的时间要少得多。例如,FlowGrid能够在不到12秒的时间内完成2360万个单元格数据集的聚类任务,而其他算法则需要500秒以上或出现错误。结论:FlowGrid是一种适用于大型单细胞流式细胞术数据的超快速聚类算法。源代码可从https://github.com/VCCRI/FlowGrid获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Systems Biology
BMC Systems Biology 生物-数学与计算生物学
CiteScore
6.30
自引率
0.00%
发文量
0
审稿时长
9 months
期刊介绍: Cessation. BMC Systems Biology is an open access journal publishing original peer-reviewed research articles in experimental and theoretical aspects of the function of biological systems at the molecular, cellular or organismal level, in particular those addressing the engineering of biological systems, network modelling, quantitative analyses, integration of different levels of information and synthetic biology.
期刊最新文献
Correction to: A quantitative systems pharmacology (QSP) model for Pneumocystis treatment in mice Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms. Anti-TNF- αtreatment-related pathways and biomarkers revealed by transcriptome analysis in Chinese psoriasis patients. Boolean network modeling of β-cell apoptosis and insulin resistance in type 2 diabetes mellitus. Ultrafast clustering of single-cell flow cytometry data using FlowGrid.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1