{"title":"Metacell-based differential expression analysis identifies cell type specific temporal gene response programs in COVID-19 patient PBMCs","authors":"Kevin O’Leary, Deyou Zheng","doi":"10.1038/s41540-024-00364-2","DOIUrl":null,"url":null,"abstract":"<p>By profiling gene expression in individual cells, single-cell RNA-sequencing (scRNA-seq) can resolve cellular heterogeneity and cell-type gene expression dynamics. Its application to time-series samples can identify temporal gene programs active in different cell types, for example, immune cells’ responses to viral infection. However, current scRNA-seq analysis has limitations. One is the low number of genes detected per cell. The second is insufficient replicates (often 1-2) due to high experimental cost. The third lies in the data analysis—treating individual cells as independent measurements leads to inflated statistics. To address these, we explore a new computational framework, specifically whether “metacells” constructed to maintain cellular heterogeneity within individual cell types (or clusters) can be used as “replicates” for increasing statistical rigor. Toward this, we applied SEACells to a time-series scRNA-seq dataset from peripheral blood mononuclear cells (PBMCs) after SARS-CoV-2 infection to construct metacells, and used them in maSigPro for quadratic regression to find significantly differentially expressed genes (DEGs) over time, followed by clustering expression velocity trends. We showed that such metacells retained greater expression variances and produced more biologically meaningful DEGs compared to either metacells generated randomly or from simple pseudobulk methods. More specifically, this approach correctly identified the known ISG15 interferon response program in almost all PBMC cell types and many DEGs enriched in the previously defined SARS-CoV-2 infection response pathway. It also uncovered additional and more cell type-specific temporal gene expression programs. Overall, our results demonstrate that the metacell-pseudoreplicate strategy could potentially overcome the limitation of 1-2 replicates.</p>","PeriodicalId":19345,"journal":{"name":"NPJ Systems Biology and Applications","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Systems Biology and Applications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41540-024-00364-2","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
By profiling gene expression in individual cells, single-cell RNA-sequencing (scRNA-seq) can resolve cellular heterogeneity and cell-type gene expression dynamics. Its application to time-series samples can identify temporal gene programs active in different cell types, for example, immune cells’ responses to viral infection. However, current scRNA-seq analysis has limitations. One is the low number of genes detected per cell. The second is insufficient replicates (often 1-2) due to high experimental cost. The third lies in the data analysis—treating individual cells as independent measurements leads to inflated statistics. To address these, we explore a new computational framework, specifically whether “metacells” constructed to maintain cellular heterogeneity within individual cell types (or clusters) can be used as “replicates” for increasing statistical rigor. Toward this, we applied SEACells to a time-series scRNA-seq dataset from peripheral blood mononuclear cells (PBMCs) after SARS-CoV-2 infection to construct metacells, and used them in maSigPro for quadratic regression to find significantly differentially expressed genes (DEGs) over time, followed by clustering expression velocity trends. We showed that such metacells retained greater expression variances and produced more biologically meaningful DEGs compared to either metacells generated randomly or from simple pseudobulk methods. More specifically, this approach correctly identified the known ISG15 interferon response program in almost all PBMC cell types and many DEGs enriched in the previously defined SARS-CoV-2 infection response pathway. It also uncovered additional and more cell type-specific temporal gene expression programs. Overall, our results demonstrate that the metacell-pseudoreplicate strategy could potentially overcome the limitation of 1-2 replicates.
期刊介绍:
npj Systems Biology and Applications is an online Open Access journal dedicated to publishing the premier research that takes a systems-oriented approach. The journal aims to provide a forum for the presentation of articles that help define this nascent field, as well as those that apply the advances to wider fields. We encourage studies that integrate, or aid the integration of, data, analyses and insight from molecules to organisms and broader systems. Important areas of interest include not only fundamental biological systems and drug discovery, but also applications to health, medical practice and implementation, big data, biotechnology, food science, human behaviour, broader biological systems and industrial applications of systems biology.
We encourage all approaches, including network biology, application of control theory to biological systems, computational modelling and analysis, comprehensive and/or high-content measurements, theoretical, analytical and computational studies of system-level properties of biological systems and computational/software/data platforms enabling such studies.