Yusha Liu, Peter Carbonetto, Jason Willwerscheid, Scott A. Oakes, Kay F. Macleod, Matthew Stephens
{"title":"Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition","authors":"Yusha Liu, Peter Carbonetto, Jason Willwerscheid, Scott A. Oakes, Kay F. Macleod, Matthew Stephens","doi":"10.1038/s41588-024-01997-z","DOIUrl":null,"url":null,"abstract":"Profiling tumors with single-cell RNA sequencing has the potential to identify recurrent patterns of transcription variation related to cancer progression, and to produce therapeutically relevant insights. However, strong intertumor heterogeneity can obscure more subtle patterns that are shared across tumors. Here we introduce a statistical method, generalized binary covariance decomposition (GBCD), to address this problem. We show that GBCD can decompose transcriptional heterogeneity into interpretable components—including patient-specific, dataset-specific and shared components relevant to disease subtypes—and that, in the presence of strong intertumor heterogeneity, it can produce more interpretable results than existing methods. Applied to data on pancreatic ductal adenocarcinoma, GBCD produced a refined characterization of existing tumor subtypes, and identified a gene expression program prognostic of poor survival independent of tumor stage and subtype. The gene expression program is enriched for genes involved in stress responses, and suggests a role for the integrated stress response in pancreatic ductal adenocarcinoma. Generalized binary covariance decomposition (GBCD) applies empirical Bayes matrix factorization to identify shared and sample-specific gene expression signatures in single-cell RNA sequencing data, and can more accurately capture inter- and intrasample heterogeneity than existing methods.","PeriodicalId":18985,"journal":{"name":"Nature genetics","volume":"57 1","pages":"263-273"},"PeriodicalIF":31.7000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41588-024-01997-z","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Profiling tumors with single-cell RNA sequencing has the potential to identify recurrent patterns of transcription variation related to cancer progression, and to produce therapeutically relevant insights. However, strong intertumor heterogeneity can obscure more subtle patterns that are shared across tumors. Here we introduce a statistical method, generalized binary covariance decomposition (GBCD), to address this problem. We show that GBCD can decompose transcriptional heterogeneity into interpretable components—including patient-specific, dataset-specific and shared components relevant to disease subtypes—and that, in the presence of strong intertumor heterogeneity, it can produce more interpretable results than existing methods. Applied to data on pancreatic ductal adenocarcinoma, GBCD produced a refined characterization of existing tumor subtypes, and identified a gene expression program prognostic of poor survival independent of tumor stage and subtype. The gene expression program is enriched for genes involved in stress responses, and suggests a role for the integrated stress response in pancreatic ductal adenocarcinoma. Generalized binary covariance decomposition (GBCD) applies empirical Bayes matrix factorization to identify shared and sample-specific gene expression signatures in single-cell RNA sequencing data, and can more accurately capture inter- and intrasample heterogeneity than existing methods.
期刊介绍:
Nature Genetics publishes the very highest quality research in genetics. It encompasses genetic and functional genomic studies on human and plant traits and on other model organisms. Current emphasis is on the genetic basis for common and complex diseases and on the functional mechanism, architecture and evolution of gene networks, studied by experimental perturbation.
Integrative genetic topics comprise, but are not limited to:
-Genes in the pathology of human disease
-Molecular analysis of simple and complex genetic traits
-Cancer genetics
-Agricultural genomics
-Developmental genetics
-Regulatory variation in gene expression
-Strategies and technologies for extracting function from genomic data
-Pharmacological genomics
-Genome evolution