scaDA: A novel statistical method for differential analysis of single-cell chromatin accessibility sequencing data.

IF 3.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS PLoS Computational Biology Pub Date : 2024-08-02 eCollection Date: 2024-08-01 DOI:10.1371/journal.pcbi.1011854
Fengdi Zhao, Xin Ma, Bing Yao, Qing Lu, Li Chen
{"title":"scaDA: A novel statistical method for differential analysis of single-cell chromatin accessibility sequencing data.","authors":"Fengdi Zhao, Xin Ma, Bing Yao, Qing Lu, Li Chen","doi":"10.1371/journal.pcbi.1011854","DOIUrl":null,"url":null,"abstract":"<p><p>Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named \"scaDA\", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324137/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1011854","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
scaDA:用于单细胞染色质可及性测序数据差异分析的新型统计方法。
单细胞 ATAC-seq 测序数据(scATAC-seq)已被广泛用于研究单细胞水平的染色质可及性。scATAC-seq数据分析的一个重要应用是差异染色质可及性(DA)分析。然而,scATAC-seq 的数据特征(如过多的零和细胞间染色质可及性的巨大变异性)给 DA 分析带来了独特的挑战。现有的统计方法侧重于检测染色质可及区域的平均差异,而忽略了分布差异。基于对细胞类型间存在分布差异的实际数据的探索,我们引入了一种名为 "scaDA "的新型复合统计检验,它基于零膨胀负二项模型(ZINB),通过同时检测丰度、流行度和离散度来进行染色质可及性的差异分布分析。得益于离散度缩小和平均值与流行度参数估计的迭代改进,ScaDA 在一项综合模拟研究中取得了最高的功率和最佳的 FDR 控制,证明了它优于基于 ZINB 的似然比检验和已发表的方法。除了在三项真实 sc-multiome 数据分析中显示出最高的功率之外,scaDA 还成功地从一项阿尔茨海默病(AD)研究的 sc-multiome 数据中识别出了小胶质细胞中的不同可访问区域,这些区域在与神经发生和 AD 临床表型相关的 GO 术语以及与 AD 相关的 GWAS SNP 中最为富集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
PLoS Computational Biology
PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.10
自引率
4.70%
发文量
820
审稿时长
2.5 months
期刊介绍: PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.
期刊最新文献
Shedding light on blue-green photosynthesis: A wavelength-dependent mathematical model of photosynthesis in Synechocystis sp. PCC 6803 Design and implementation of an asynchronous online course-based undergraduate research experience (CURE) in computational genomics An exploration into CTEPH medications: Combining natural language processing, embedding learning, in vitro models, and real-world evidence for drug repurposing Calibrating dimension reduction hyperparameters in the presence of noise Comparison and benchmark of deep learning methods for non-coding RNA classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1