EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data

IF 4.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2024-04-01 DOI:10.1093/bioinformatics/btae191

Sijie Li, Yuxi Li, Yu Sun, Yaru Li, Xiaoyang Chen, Songming Tang, Shengquan Chen

{"title":"EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data","authors":"Sijie Li, Yuxi Li, Yu Sun, Yaru Li, Xiaoyang Chen, Songming Tang, Shengquan Chen","doi":"10.1093/bioinformatics/btae191","DOIUrl":null,"url":null,"abstract":"Abstract Summary Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. Availability and implementation The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae191","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Summary Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. Availability and implementation The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EpiCarousel：为图集级单细胞染色质可及性数据识别元细胞提供记忆和时间效率

摘要摘要单细胞染色质可及性测序（scCAS）技术的最新进展为表观遗传异质性的表征带来了新的见解。随着单细胞基因组学实验规模扩大到数十万个细胞，下游分析对计算资源的需求越来越大，超出了大多数研究人员的能力。在这里，我们提出了 EpiCarousel，这是一个基于懒加载、并行处理和群落检测的定制 Python 软件包，用于在大规模 scCAS 数据中高效地识别元细胞（即同源细胞的出现）。通过对不同协议、样本大小、维度、细胞类型数量和细胞类型失衡程度的五个数据集进行综合实验，EpiCarousel在内存使用、计算时间和包括细胞类型鉴定在内的多种下游分析的系统性评估中均优于基线方法。此外，EpiCarousel 还能在 2 小时内对包含 707043 个细胞和 1 154 611 个峰的图集级数据集执行预处理和下游细胞聚类，内存消耗小于 75 GB，在表征细胞异质性方面的性能优于最先进的方法。可用性和实施 EpiCarousel 软件文档齐全，可在 https://github.com/biox-nku/epicarousel 免费获取。它可以与多种 scCAS 分析工具包无缝互操作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.