scShapes: a statistical framework for identifying distribution shapes in single-cell RNA-sequencing data.

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES GigaScience Pub Date : 2022-12-28 Epub Date: 2023-01-24 DOI:10.1093/gigascience/giac126
Malindrie Dharmaratne, Ameya S Kulkarni, Atefeh Taherian Fard, Jessica C Mar
{"title":"scShapes: a statistical framework for identifying distribution shapes in single-cell RNA-sequencing data.","authors":"Malindrie Dharmaratne, Ameya S Kulkarni, Atefeh Taherian Fard, Jessica C Mar","doi":"10.1093/gigascience/giac126","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Single-cell RNA sequencing (scRNA-seq) methods have been advantageous for quantifying cell-to-cell variation by profiling the transcriptomes of individual cells. For scRNA-seq data, variability in gene expression reflects the degree of variation in gene expression from one cell to another. Analyses that focus on cell-cell variability therefore are useful for going beyond changes based on average expression and, instead, identifying genes with homogeneous expression versus those that vary widely from cell to cell.</p><p><strong>Results: </strong>We present a novel statistical framework, scShapes, for identifying differential distributions in single-cell RNA-sequencing data using generalized linear models. Most approaches for differential gene expression detect shifts in the mean value. However, as single-cell data are driven by overdispersion and dropouts, moving beyond means and using distributions that can handle excess zeros is critical. scShapes quantifies gene-specific cell-to-cell variability by testing for differences in the expression distribution while flexibly adjusting for covariates if required. We demonstrate that scShapes identifies subtle variations that are independent of altered mean expression and detects biologically relevant genes that were not discovered through standard approaches.</p><p><strong>Conclusions: </strong>This analysis also draws attention to genes that switch distribution shapes from a unimodal distribution to a zero-inflated distribution and raises open questions about the plausible biological mechanisms that may give rise to this, such as transcriptional bursting. Overall, the results from scShapes help to expand our understanding of the role that gene expression plays in the transcriptional regulation of a specific perturbation or cellular phenotype. Our framework scShapes is incorporated into a Bioconductor R package (https://www.bioconductor.org/packages/release/bioc/html/scShapes.html).</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9871437/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giac126","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Single-cell RNA sequencing (scRNA-seq) methods have been advantageous for quantifying cell-to-cell variation by profiling the transcriptomes of individual cells. For scRNA-seq data, variability in gene expression reflects the degree of variation in gene expression from one cell to another. Analyses that focus on cell-cell variability therefore are useful for going beyond changes based on average expression and, instead, identifying genes with homogeneous expression versus those that vary widely from cell to cell.

Results: We present a novel statistical framework, scShapes, for identifying differential distributions in single-cell RNA-sequencing data using generalized linear models. Most approaches for differential gene expression detect shifts in the mean value. However, as single-cell data are driven by overdispersion and dropouts, moving beyond means and using distributions that can handle excess zeros is critical. scShapes quantifies gene-specific cell-to-cell variability by testing for differences in the expression distribution while flexibly adjusting for covariates if required. We demonstrate that scShapes identifies subtle variations that are independent of altered mean expression and detects biologically relevant genes that were not discovered through standard approaches.

Conclusions: This analysis also draws attention to genes that switch distribution shapes from a unimodal distribution to a zero-inflated distribution and raises open questions about the plausible biological mechanisms that may give rise to this, such as transcriptional bursting. Overall, the results from scShapes help to expand our understanding of the role that gene expression plays in the transcriptional regulation of a specific perturbation or cellular phenotype. Our framework scShapes is incorporated into a Bioconductor R package (https://www.bioconductor.org/packages/release/bioc/html/scShapes.html).

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
scShapes:用于识别单细胞 RNA 序列数据分布形状的统计框架。
背景:单细胞 RNA 测序(scRNA-seq)方法可通过分析单个细胞的转录组来量化细胞间的差异。对于 scRNA-seq 数据,基因表达的变异性反映了不同细胞间基因表达的差异程度。因此,以细胞间变异性为重点的分析有助于超越基于平均表达的变化,进而识别同质表达的基因与细胞间差异较大的基因:我们提出了一种新的统计框架--scShapes,用于使用广义线性模型识别单细胞 RNA 序列数据中的差异分布。差异基因表达的大多数方法都能检测到平均值的变化。scShapes 通过检测表达分布的差异来量化特定基因细胞间的变异性,同时根据需要灵活调整协变量。我们证明,scShapes 能够识别与平均表达量变化无关的微妙变化,并检测出标准方法未发现的生物相关基因:这项分析还引起了人们对分布形状从单峰分布转变为零膨胀分布的基因的关注,并提出了可能导致这种情况的合理生物学机制的开放性问题,如转录突变。总之,scShapes 的结果有助于拓展我们对基因表达在特定扰动或细胞表型的转录调控中所扮演角色的理解。我们的 scShapes 框架已纳入 Bioconductor R 软件包 (https://www.bioconductor.org/packages/release/bioc/html/scShapes.html)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
期刊最新文献
IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning Large-scale genomic survey with deep learning-based method reveals strain-level phage specificity determinants An effective strategy for assembling the sex-limited chromosome Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1