Faster and more accurate assessment of differential transcript expression with Gibbs sampling and edgeR v4.

IF 2.8 Q1 GENETICS & HEREDITY NAR Genomics and Bioinformatics Pub Date : 2024-11-04 eCollection Date: 2024-09-01 DOI:10.1093/nargab/lqae151

Pedro L Baldoni, Lizhong Chen, Gordon K Smyth

{"title":"Faster and more accurate assessment of differential transcript expression with Gibbs sampling and edgeR v4.","authors":"Pedro L Baldoni, Lizhong Chen, Gordon K Smyth","doi":"10.1093/nargab/lqae151","DOIUrl":null,"url":null,"abstract":"<p><p>This article further develops edgeR's divided-count approach for differential transcript expression (DTE) analysis of RNA-seq data to produce a faster and more accurate pipeline. The divided-count approach models the precision of transcript quantifications from the kallisto and Salmon software tools and divides the estimated overdispersions out of the transcript read counts, after which the divided-counts can be analysed by statistical tools developed for gene-level counts. This article adds three new refinements to the pipeline that dramatically decrease the computational overhead and storage requirements so that DTE analysis of very large datasets becomes practical. The new pipeline replaces bootstrap with Gibbs resampling and replaces edgeR v3 with v4. Both of these changes improve statistical power and accuracy and provide better resolution for low-count transcripts. The accuracy of overdispersion estimation is shown to depend on the total number of resamples across the whole dataset rather than on individual samples, dramatically reducing the recommended number of technical samples for large datasets. Test data and extensive simulations data show that the new pipeline is more powerful and efficient than previous DTE pipelines while providing correct control of the false discovery rate for any sample size.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae151"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11532793/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqae151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

This article further develops edgeR's divided-count approach for differential transcript expression (DTE) analysis of RNA-seq data to produce a faster and more accurate pipeline. The divided-count approach models the precision of transcript quantifications from the kallisto and Salmon software tools and divides the estimated overdispersions out of the transcript read counts, after which the divided-counts can be analysed by statistical tools developed for gene-level counts. This article adds three new refinements to the pipeline that dramatically decrease the computational overhead and storage requirements so that DTE analysis of very large datasets becomes practical. The new pipeline replaces bootstrap with Gibbs resampling and replaces edgeR v3 with v4. Both of these changes improve statistical power and accuracy and provide better resolution for low-count transcripts. The accuracy of overdispersion estimation is shown to depend on the total number of resamples across the whole dataset rather than on individual samples, dramatically reducing the recommended number of technical samples for large datasets. Test data and extensive simulations data show that the new pipeline is more powerful and efficient than previous DTE pipelines while providing correct control of the false discovery rate for any sample size.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用 Gibbs 采样和 edgeR v4 更快、更准确地评估差异转录本表达。

本文进一步开发了 edgeR 用于 RNA-seq 数据差异转录本表达（DTE）分析的分割计数法，以生成更快、更准确的管道。分割计数法对 kallisto 和 Salmon 软件工具的转录本定量精度进行建模，并将估计的过度分散从转录本读数计数中分割出来，然后用为基因水平计数开发的统计工具对分割计数进行分析。本文对这一流程进行了三项新的改进，大大降低了计算开销和存储要求，从而使超大数据集的 DTE 分析变得切实可行。新管道用吉布斯重采样取代了 bootstrap，用 v4 取代了 edgeR v3。这两项改动都提高了统计能力和准确性，并为低计数转录本提供了更好的分辨率。研究表明，过度分散估计的准确性取决于整个数据集的重采样总数，而不是单个样本，从而大大减少了大型数据集的建议技术样本数量。测试数据和大量模拟数据表明，新管道比以前的 DTE 管道更强大、更高效，同时能正确控制任何样本量的误发现率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊