Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches.

IF 1.7 4区 生物学 Q4 EVOLUTIONARY BIOLOGY Evolutionary Bioinformatics Pub Date : 2022-01-01 DOI:10.1177/11769343221123050
Arda Durmaz, Jacob G Scott
{"title":"Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches.","authors":"Arda Durmaz,&nbsp;Jacob G Scott","doi":"10.1177/11769343221123050","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq analysis workflows in the setting of dimension reduction, clustering, and trajectory inference.</p><p><strong>Methods: </strong>We utilized datasets with temporal single-cell transcriptomics profiles from public repositories. Combining multiple methods at each level of the workflow, we have performed over 6<i>k</i> analysis and evaluated the results of clustering and pseudotime estimation using adjusted rand index and rank correlation metrics. We have further integrated neural network methods to assess whether models with increased complexity can show increased bias/variance trade-off.</p><p><strong>Results: </strong>Combinatorial workflows showed that utilizing non-linear dimension reduction techniques such as t-SNE and UMAP are sensitive to initial preprocessing steps hence clustering results on dimension reduced space of single-cell datasets should be utilized carefully. Similarly, pseudotime estimation methods that depend on previous non-linear dimension reduction steps can result in highly variable trajectories. In contrast, methods that avoid non-linearity such as WOT can result in repeatable inferences of temporal gene expression dynamics. Furthermore, imputation methods do not improve clustering or trajectory inference results substantially in terms of repeatability. In contrast, the selection of the normalization method shows an increased effect on downstream analysis where ScTransform reduces variability overall.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"18 ","pages":"11769343221123050"},"PeriodicalIF":1.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/07/96/10.1177_11769343221123050.PMC9527995.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1177/11769343221123050","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq analysis workflows in the setting of dimension reduction, clustering, and trajectory inference.

Methods: We utilized datasets with temporal single-cell transcriptomics profiles from public repositories. Combining multiple methods at each level of the workflow, we have performed over 6k analysis and evaluated the results of clustering and pseudotime estimation using adjusted rand index and rank correlation metrics. We have further integrated neural network methods to assess whether models with increased complexity can show increased bias/variance trade-off.

Results: Combinatorial workflows showed that utilizing non-linear dimension reduction techniques such as t-SNE and UMAP are sensitive to initial preprocessing steps hence clustering results on dimension reduced space of single-cell datasets should be utilized carefully. Similarly, pseudotime estimation methods that depend on previous non-linear dimension reduction steps can result in highly variable trajectories. In contrast, methods that avoid non-linearity such as WOT can result in repeatable inferences of temporal gene expression dynamics. Furthermore, imputation methods do not improve clustering or trajectory inference results substantially in terms of repeatability. In contrast, the selection of the normalization method shows an increased effect on downstream analysis where ScTransform reduces variability overall.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
scRNA-Seq分析工作流程的稳定性容易受到预处理的影响,并且可以通过正则化或监督方法来降低稳定性。
背景:用于解决单细胞数据集中各种问题的统计方法显示,不同参数制度的可变性增加。为了进一步描述单细胞RNA-Seq常用方法的鲁棒性,我们旨在全面回顾在降维、聚类和轨迹推断方面的scRNA-Seq分析工作流程。方法:我们利用来自公共数据库的单细胞转录组学数据集。在工作流程的每个级别上结合多种方法,我们已经执行了超过6k的分析,并使用调整后的rand指数和秩相关指标评估聚类和伪时间估计的结果。我们进一步集成了神经网络方法来评估复杂性增加的模型是否会显示出增加的偏差/方差权衡。结果:组合工作流表明,利用非线性降维技术(如t-SNE和UMAP)对初始预处理步骤敏感,因此应谨慎利用单细胞数据集降维空间上的聚类结果。类似地,依赖于先前非线性降维步骤的伪时间估计方法可能导致高度可变的轨迹。相比之下,避免非线性的方法,如WOT,可以导致时间基因表达动态的可重复推断。此外,在可重复性方面,imputation方法并不能显著提高聚类或轨迹推断结果。相比之下,规范化方法的选择在下游分析中显示出更大的影响,其中ScTransform总体上减少了可变性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Evolutionary Bioinformatics
Evolutionary Bioinformatics 生物-进化生物学
CiteScore
4.20
自引率
0.00%
发文量
25
审稿时长
12 months
期刊介绍: Evolutionary Bioinformatics is an open access, peer reviewed international journal focusing on evolutionary bioinformatics. The journal aims to support understanding of organismal form and function through use of molecular, genetic, genomic and proteomic data by giving due consideration to its evolutionary context.
期刊最新文献
In silico Characterization of a Hypothetical Protein (PBJ89160.1) from Neisseria meningitidis Exhibits a New Insight on Nutritional Virulence and Molecular Docking to Uncover a Therapeutic Target. Comparative Phylogenetic Analysis and Protein Prediction Reveal the Taxonomy and Diverse Distribution of Virulence Factors in Foodborne Clostridium Strains. An Effective Computational Method for Predicting Self-Interacting Proteins Based on VGGNet Convolutional Neural Network and Gray-Level Co-occurrence Matrix. Comprehensive Profiling of Transcriptome and m6A Epitranscriptome Uncovers the Neurotoxic Effects of Yunaconitine on HT22 Cells. Label Transfer for Drug Disease Association in Three Meta-Paths
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1