Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.

bioRxiv : the preprint server for biology Pub Date : 2024-11-20 DOI:10.1101/2023.03.17.533005

Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos

{"title":"Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.","authors":"Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos","doi":"10.1101/2023.03.17.533005","DOIUrl":null,"url":null,"abstract":"<p><p>Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/56/f1/nihpp-2023.03.17.533005v1.PMC10055252.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.03.17.533005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用dreamlet对大规模单细胞转录组学数据进行高效差异表达分析。

单细胞和细胞核转录组学的进展使数百名受试者和数百万细胞能够生成越来越大规模的数据集。这些研究有望为人类疾病的细胞类型特异性生物学提供前所未有的见解。然而，由于这些复杂研究的统计建模和将分析扩展到大型数据集的挑战，在受试者之间进行差异表达分析仍然很困难。我们的开源R软件包Dreamelet（DiseaseNeurogenomics.github.io/Dreamelet）使用基于精确加权线性混合模型的伪批量方法来识别每个细胞簇中受试者差异表达的基因。dreamlet是为来自大型队列的数据而设计的，与现有的工作流程相比，它速度更快，使用的内存更少，同时支持复杂的统计模型并控制假阳性率。我们在已发表的数据集上展示了计算和统计性能，并在150例阿尔茨海默病病例和149名对照的尸检大脑中展示了140万个单核的新数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量