Many African great ape chromosomes possess large subterminal heterochromatic caps at their telomeres that are conspicuously absent from the human lineage. Leveraging the complete sequences of great ape genomes, we characterize the organization of subterminal caps and reconstruct the evolutionary history of these regions in chimpanzees and gorillas. Detailed analyses of the composition of the associated terminal 32 bp satellite array from chimpanzee (termed pCht) and intervening segmental duplication (SD) spacers confirm two independent origins in the Pan and gorilla lineages. In chimpanzee and bonobo, we estimate these structures emerged ∼7.7 million years ago (MYA) in contrast to gorilla, in which they expanded more recently, ∼5.0 MYA, and now make up 8.5% of the total gorilla genome. In both lineages, the SD spacers punctuating the pCht heterochromatic satellite arrays correspond to pockets of decreased methylation, although in gorilla such regions are significantly less methylated (P < 2.2 × 10-16) than in chimpanzee or bonobo. Allelic pairs of subterminal caps show a higher degree of sequence divergence than euchromatic sequences, with bonobo showing less divergent haplotypes and less differentially methylated spacers. In contrast, we identify virtually identical subterminal caps mapping to nonhomologous chromosomes within a species, suggesting ectopic recombination potentially mediated by SD spacers. We find that the transition regions from heterochromatic subterminal caps to euchromatin are enriched for structural variant insertions and lineage-specific duplicated genes. Our findings suggest independent evolution of subterminal caps converging on a common genetic and epigenetic structure that promoted ectopic exchange as well as the emergence of novel genes at transition regions between euchromatin and heterochromatin.
{"title":"Epigenetic and evolutionary features of ape subterminal heterochromatin.","authors":"DongAhn Yoo, Katherine M Munson, Evan E Eichler","doi":"10.1101/gr.280987.125","DOIUrl":"10.1101/gr.280987.125","url":null,"abstract":"<p><p>Many African great ape chromosomes possess large subterminal heterochromatic caps at their telomeres that are conspicuously absent from the human lineage. Leveraging the complete sequences of great ape genomes, we characterize the organization of subterminal caps and reconstruct the evolutionary history of these regions in chimpanzees and gorillas. Detailed analyses of the composition of the associated terminal 32 bp satellite array from chimpanzee (termed pCht) and intervening segmental duplication (SD) spacers confirm two independent origins in the <i>Pan</i> and gorilla lineages. In chimpanzee and bonobo, we estimate these structures emerged ∼7.7 million years ago (MYA) in contrast to gorilla, in which they expanded more recently, ∼5.0 MYA, and now make up 8.5% of the total gorilla genome. In both lineages, the SD spacers punctuating the pCht heterochromatic satellite arrays correspond to pockets of decreased methylation, although in gorilla such regions are significantly less methylated (<i>P</i> < 2.2 × 10<sup>-16</sup>) than in chimpanzee or bonobo. Allelic pairs of subterminal caps show a higher degree of sequence divergence than euchromatic sequences, with bonobo showing less divergent haplotypes and less differentially methylated spacers. In contrast, we identify virtually identical subterminal caps mapping to nonhomologous chromosomes within a species, suggesting ectopic recombination potentially mediated by SD spacers. We find that the transition regions from heterochromatic subterminal caps to euchromatin are enriched for structural variant insertions and lineage-specific duplicated genes. Our findings suggest independent evolution of subterminal caps converging on a common genetic and epigenetic structure that promoted ectopic exchange as well as the emergence of novel genes at transition regions between euchromatin and heterochromatin.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"38-49"},"PeriodicalIF":5.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145344936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genomic imprinting is a specialized mechanism of transcriptional regulation whereby approximately 200 mammalian genes are expressed monoallelically according to their parental origin. This crucial developmental process is primarily controlled by discrete cis-regulatory elements known as imprinting control regions (ICRs), which play essential roles in directing allele-specific gene expression across large imprinted domains. In this review, we highlight the features that define ICRs as a distinct class of cis-regulatory regions, from their ability to maintain germline-inherited DNA methylation to their multifunctional roles in transcriptional control. For each imprinted domain, we examine the diverse mechanisms by which individual ICRs integrate multiple regulatory functions to coordinate both proximal and distal imprinted gene expression. By uncovering the multifaceted roles of ICRs, this review provides a compelling framework for understanding, more broadly, the molecular basis of finely controlled gene expression.
{"title":"The superpowers of imprinting control regions.","authors":"Bertille Montibus, Franck Court, Philippe Arnaud","doi":"10.1101/gr.281215.125","DOIUrl":"10.1101/gr.281215.125","url":null,"abstract":"<p><p>Genomic imprinting is a specialized mechanism of transcriptional regulation whereby approximately 200 mammalian genes are expressed monoallelically according to their parental origin. This crucial developmental process is primarily controlled by discrete <i>cis</i>-regulatory elements known as imprinting control regions (ICRs), which play essential roles in directing allele-specific gene expression across large imprinted domains. In this review, we highlight the features that define ICRs as a distinct class of <i>cis</i>-regulatory regions, from their ability to maintain germline-inherited DNA methylation to their multifunctional roles in transcriptional control. For each imprinted domain, we examine the diverse mechanisms by which individual ICRs integrate multiple regulatory functions to coordinate both proximal and distal imprinted gene expression. By uncovering the multifaceted roles of ICRs, this review provides a compelling framework for understanding, more broadly, the molecular basis of finely controlled gene expression.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1-19"},"PeriodicalIF":5.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758399/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145722442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruhollah Shemirani, Gillian M Belbin, Sinead Cullina, Christa Caggiano, Christopher Gignoux, Noah Zaitlen, Eimear Kenny
Population structure is a well-known confounder in statistical genetics, particularly in genome-wide association studies (GWAS), where it can lead to inflated test statistics and spurious associations. Traditional methods, such as principal components (PCs), commonly used to adjust for population structure, are limited in capturing fine-scale, nonlinear patterns that arise from recent demographic events - patterns that are crucial for understanding rare variant effects. To address this challenge, we propose a novel method called SPectral Components (SPCs), which leverages identity-by-descent (IBD) graphs to capture and transform local, nonlinear fine-scale population structure into continuous representations that can be seamlessly integrated into genetic analysis pipelines. Using both simulated datasets and empirical data from the UK Biobank (N ≈ 420,000), we demonstrate that SPCs outperform PCs in adjusting for fine-scale population structure. In simulations, SPCs explained over 90% of the fine-scale population structure with fewer components, while PCs captured less than 5%. In the UK Biobank, SPCs reduced the inflation of P-values in the GWAS of an environmental-driven phenotype by 12% compared to PCs, while maintaining a similar performance to PCs in height, a highly heritable phenotype. Additionally, SPCs improved rare variant association analyses, reducing genomic inflation (e.g., from 7.6 to 1.2 in one analysis), and provided more accurate heritability estimates. Spatial autocorrelation analysis further confirmed the ability of SPCs to account for environmental effects, reducing Moran's I for both environmental and heritable phenotypes more effectively than PCs. Overall, our findings demonstrate that SPCs provide a robust, scalable adjustment for recent population structure, offering a powerful alternative or complement to PCs in large-scale biobank studies.
{"title":"A spectral component approach leveraging Identity-by-Descent graphs to address recent population structure in genomic analysis.","authors":"Ruhollah Shemirani, Gillian M Belbin, Sinead Cullina, Christa Caggiano, Christopher Gignoux, Noah Zaitlen, Eimear Kenny","doi":"10.1101/gr.280659.125","DOIUrl":"10.1101/gr.280659.125","url":null,"abstract":"<p><p>Population structure is a well-known confounder in statistical genetics, particularly in genome-wide association studies (GWAS), where it can lead to inflated test statistics and spurious associations. Traditional methods, such as principal components (PCs), commonly used to adjust for population structure, are limited in capturing fine-scale, nonlinear patterns that arise from recent demographic events - patterns that are crucial for understanding rare variant effects. To address this challenge, we propose a novel method called SPectral Components (SPCs), which leverages identity-by-descent (IBD) graphs to capture and transform local, nonlinear fine-scale population structure into continuous representations that can be seamlessly integrated into genetic analysis pipelines. Using both simulated datasets and empirical data from the UK Biobank (N ≈ 420,000), we demonstrate that SPCs outperform PCs in adjusting for fine-scale population structure. In simulations, SPCs explained over 90% of the fine-scale population structure with fewer components, while PCs captured less than 5%. In the UK Biobank, SPCs reduced the inflation of <i>P</i>-values in the GWAS of an environmental-driven phenotype by 12% compared to PCs, while maintaining a similar performance to PCs in height, a highly heritable phenotype. Additionally, SPCs improved rare variant association analyses, reducing genomic inflation (e.g., from 7.6 to 1.2 in one analysis), and provided more accurate heritability estimates. Spatial autocorrelation analysis further confirmed the ability of SPCs to account for environmental effects, reducing Moran's I for both environmental and heritable phenotypes more effectively than PCs. Overall, our findings demonstrate that SPCs provide a robust, scalable adjustment for recent population structure, offering a powerful alternative or complement to PCs in large-scale biobank studies.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":5.5,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145818974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei-Hao Lee, Lechuan Li, Ruth Dannenfelser, Vicky Yao
As sequencing techniques advance in precision, affordability, and diversity, an abundance of heterogeneous sequencing data has become available, encompassing a wide range of phenotypic features and biological perturbations. Unfortunately, increased resolution comes with the cost of increased complexity of the biological search space, even at the individual study level, as perturbations are now often examined across many dimensions simultaneously, including different donor phenotypes, anatomical regions and cell types, and time points. Furthermore, broad integration across studies promises a unique opportunity to explore the molecular underpinnings of distinct healthy and disease states, larger than the original scope of the individual study. To fully realize the promise of both individual higher resolution studies and large cross-study integrations, we need a robust methodology that can disentangle the influence of technical and nonrelevant phenotypic factors, isolating relevant condition-specific signals from shared biological information while also providing interpretable insights into the genetic effects of these conditions. Current methods typically excel in only one of these areas. To address this gap, we have developed ALPINE, a supervised nonnegative matrix factorization (NMF) framework that effectively separates both technical and nontechnical factors while simultaneously offering direct interpretability of condition-associated genes. Through simulations across four different scenarios, we demonstrate that ALPINE outperforms existing methods in both isolating the effect of different phenotypic conditions and prioritizing condition-associated genes. Furthermore, ALPINE has favorable performance in batch effect removal compared with state-of-the-art integration methods. When applied to real-world case studies, we showcase how ALPINE can be used to extract insights into the biological mechanisms that underlie differences between phenotypic conditions.
{"title":"Interpretable phenotype decoding from multicondition sequencing data with ALPINE.","authors":"Wei-Hao Lee, Lechuan Li, Ruth Dannenfelser, Vicky Yao","doi":"10.1101/gr.280566.125","DOIUrl":"10.1101/gr.280566.125","url":null,"abstract":"<p><p>As sequencing techniques advance in precision, affordability, and diversity, an abundance of heterogeneous sequencing data has become available, encompassing a wide range of phenotypic features and biological perturbations. Unfortunately, increased resolution comes with the cost of increased complexity of the biological search space, even at the individual study level, as perturbations are now often examined across many dimensions simultaneously, including different donor phenotypes, anatomical regions and cell types, and time points. Furthermore, broad integration across studies promises a unique opportunity to explore the molecular underpinnings of distinct healthy and disease states, larger than the original scope of the individual study. To fully realize the promise of both individual higher resolution studies and large cross-study integrations, we need a robust methodology that can disentangle the influence of technical and nonrelevant phenotypic factors, isolating relevant condition-specific signals from shared biological information while also providing interpretable insights into the genetic effects of these conditions. Current methods typically excel in only one of these areas. To address this gap, we have developed ALPINE, a supervised nonnegative matrix factorization (NMF) framework that effectively separates both technical and nontechnical factors while simultaneously offering direct interpretability of condition-associated genes. Through simulations across four different scenarios, we demonstrate that ALPINE outperforms existing methods in both isolating the effect of different phenotypic conditions and prioritizing condition-associated genes. Furthermore, ALPINE has favorable performance in batch effect removal compared with state-of-the-art integration methods. When applied to real-world case studies, we showcase how ALPINE can be used to extract insights into the biological mechanisms that underlie differences between phenotypic conditions.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"2756-2769"},"PeriodicalIF":5.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145344885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio Nappi, Liubov Shilova, Theofanis Karaletsos, Na Cai, Francesco Paolo Casale
Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creating opportunities to improve RVATs. However, existing methods often rely on rigid models or single annotations, limiting their ability to leverage these advances. Here, we introduce BayesRVAT, a Bayesian rare variant association test that jointly models multiple annotations. By specifying priors on annotation effects and estimating gene- and trait-specific posterior burden scores, BayesRVAT flexibly captures diverse rare-variant architectures. In simulations, BayesRVAT improves power while maintaining calibration. In UK Biobank analyses, it detects 10.2% more blood-trait associations and reveals novel gene-disease links, including PRPH2 with retinal disease. Integrating BayesRVAT within omnibus frameworks further increases discoveries, demonstrating that flexible annotation modeling captures complementary signals beyond existing burden and variance-component tests.
{"title":"BayesRVAT enhances rare-variant association testing through Bayesian aggregation of functional annotations.","authors":"Antonio Nappi, Liubov Shilova, Theofanis Karaletsos, Na Cai, Francesco Paolo Casale","doi":"10.1101/gr.280689.125","DOIUrl":"10.1101/gr.280689.125","url":null,"abstract":"<p><p>Gene-level rare variant association tests (RVATs) are essential for uncovering disease mechanisms and identifying therapeutic targets. Advances in sequence-based machine learning have generated diverse variant pathogenicity scores, creating opportunities to improve RVATs. However, existing methods often rely on rigid models or single annotations, limiting their ability to leverage these advances. Here, we introduce BayesRVAT, a Bayesian rare variant association test that jointly models multiple annotations. By specifying priors on annotation effects and estimating gene- and trait-specific posterior burden scores, BayesRVAT flexibly captures diverse rare-variant architectures. In simulations, BayesRVAT improves power while maintaining calibration. In UK Biobank analyses, it detects 10.2% more blood-trait associations and reveals novel gene-disease links, including <i>PRPH2</i> with retinal disease. Integrating BayesRVAT within omnibus frameworks further increases discoveries, demonstrating that flexible annotation modeling captures complementary signals beyond existing burden and variance-component tests.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"2682-2690"},"PeriodicalIF":5.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145367897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A fundamental goal of genetics is to identify which and how genetic variants are associated with a trait, often using the regression results from genome-wide association (GWA) studies. Important methodological challenges account for inflation in GWA effect estimates as well as in investigating more than one trait simultaneously. We leverage machine learning approaches for these two challenges, developing a computationally efficient method called ML-MAGES. First, we shrink the inflation in GWA effect sizes caused by nonindependence among variants using neural networks. We then cluster variant associations among multiple traits via variational inference. We compare the performance of shrinkage via neural networks to regularized regression and fine-mapping, two approaches used for addressing inflated effects but dealing with variants in focal regions of different sizes. Our neural network shrinkage outperforms both methods in approximating the true effect sizes in simulated data. Our infinite mixture clustering approach offers a flexible, data-driven way to distinguish different types of associations—trait-specific, shared across traits, or nonprioritized—among multiple traits based on their regularized effects. Clustering applied to our neural network shrinkage results also produces consistently higher precision and recall for distinguishing gene-level associations in simulations. We demonstrate the application of ML-MAGES on association analyses of two quantitative traits and two binary traits in the UK Biobank. Our identified associated genes from single-trait enrichment tests overlap with those having known relevant biological processes to the traits. Besides trait-specific associations, ML-MAGES identifies several variants with shared multitrait associations, suggesting putative shared genetic architecture.
{"title":"ML-MAGES enables multivariate genetic association analyses with genes and effect size shrinkage","authors":"Xiran Liu, Lorin Crawford, Sohini Ramachandran","doi":"10.1101/gr.280440.125","DOIUrl":"https://doi.org/10.1101/gr.280440.125","url":null,"abstract":"A fundamental goal of genetics is to identify which and how genetic variants are associated with a trait, often using the regression results from genome-wide association (GWA) studies. Important methodological challenges account for inflation in GWA effect estimates as well as in investigating more than one trait simultaneously. We leverage machine learning approaches for these two challenges, developing a computationally efficient method called ML-MAGES. First, we shrink the inflation in GWA effect sizes caused by nonindependence among variants using neural networks. We then cluster variant associations among multiple traits via variational inference. We compare the performance of shrinkage via neural networks to regularized regression and fine-mapping, two approaches used for addressing inflated effects but dealing with variants in focal regions of different sizes. Our neural network shrinkage outperforms both methods in approximating the true effect sizes in simulated data. Our infinite mixture clustering approach offers a flexible, data-driven way to distinguish different types of associations—trait-specific, shared across traits, or nonprioritized—among multiple traits based on their regularized effects. Clustering applied to our neural network shrinkage results also produces consistently higher precision and recall for distinguishing gene-level associations in simulations. We demonstrate the application of ML-MAGES on association analyses of two quantitative traits and two binary traits in the UK Biobank. Our identified associated genes from single-trait enrichment tests overlap with those having known relevant biological processes to the traits. Besides trait-specific associations, ML-MAGES identifies several variants with shared multitrait associations, suggesting putative shared genetic architecture.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"19 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145553433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robin Kosch, Katharina Limm, Annette M. Staiger, Nadine S. Kurz, Nicole Seifert, Bence Oláh, Stefan Solbrig, Viola Poeschel, Gerhard Held, Marita Ziepert, Norbert Schmitz, Emil Chteinberg, Reiner Siebert, Rainer Spang, Helena U. Zacharias, German Ott, Peter J. Oefner, Michael Altenbuchinger
High-throughput bottom-up proteomics data cover 1,000s of proteins and related co- and post-translational modifications (CTMs/PTMs). Yet, it remains an open question how to holistically explore such data and their relationship to complementary omics/phenotypical information. Graphical models are particularly suited to study molecular networks and underlying regulatory mechanisms, as they can distinguish direct from indirect relationships, aside from their generalizability to diverse data types. We propose PriOmics to integrate proteomics data with complementary omics and phenotypical data. PriOmics models intensities of individual proteotypic peptides and incorporates their protein affiliation as prior knowledge to resolve statistical relationships between proteins and CTMs/PTMs. This was verified in simulation studies, which also demonstrate that PriOmics can disentangle regulatory effects of protein modifications from those of respective protein abundances. These findings were substantiated in a Diffuse Large B-Cell Lymphoma (DLBCL) dataset where we integrated SWATH-MS-based proteomics with transcriptomic and phenotypic data.
{"title":"Integration of high-throughput proteomic data and complementary omics layers with PriOmics","authors":"Robin Kosch, Katharina Limm, Annette M. Staiger, Nadine S. Kurz, Nicole Seifert, Bence Oláh, Stefan Solbrig, Viola Poeschel, Gerhard Held, Marita Ziepert, Norbert Schmitz, Emil Chteinberg, Reiner Siebert, Rainer Spang, Helena U. Zacharias, German Ott, Peter J. Oefner, Michael Altenbuchinger","doi":"10.1101/gr.279487.124","DOIUrl":"https://doi.org/10.1101/gr.279487.124","url":null,"abstract":"High-throughput bottom-up proteomics data cover 1,000s of proteins and related co- and post-translational modifications (CTMs/PTMs). Yet, it remains an open question how to holistically explore such data and their relationship to complementary omics/phenotypical information. Graphical models are particularly suited to study molecular networks and underlying regulatory mechanisms, as they can distinguish direct from indirect relationships, aside from their generalizability to diverse data types. We propose PriOmics to integrate proteomics data with complementary omics and phenotypical data. PriOmics models intensities of individual proteotypic peptides and incorporates their protein affiliation as prior knowledge to resolve statistical relationships between proteins and CTMs/PTMs. This was verified in simulation studies, which also demonstrate that PriOmics can disentangle regulatory effects of protein modifications from those of respective protein abundances. These findings were substantiated in a Diffuse Large B-Cell Lymphoma (DLBCL) dataset where we integrated SWATH-MS-based proteomics with transcriptomic and phenotypic data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145545793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aging compromises intestinal integrity, yet the chromatin changes driving this decline remain unclear. Polycomb-mediated repression is essential for silencing developmental genes, but this regulatory mechanism becomes dysregulated with age. Although shifts in Polycomb regulation within intestinal stem cells have been linked to gut aging, the Polycomb landscape of differentiated cell types remains unexplored. Differentiated cells comprise the majority of the gut epithelium and directly impact both tissue and whole organismal aging. Using single-cell chromatin profiling of the Drosophila intestine, we identify cell type-specific chromatin landscape changes during aging. We find that old enterocytes aberrantly repress genes essential for transmembrane transport and chitin metabolism, contributing to intestinal barrier decline – an example of antagonistic pleiotropy in a regenerative tissue. Barrier decline leads to derepression of JAK/STAT ligands in all cell types and increased proliferation of aging stem cells, with elevated RNA Polymerase II (RNAPII) at S-phase-dependent histone genes. Specific upregulation of histone genes during aging stem cell proliferation resembles RNAPII hypertranscription of histone genes in aggressive human cancers. Our work reveals that misregulation of the Polycomb-mediated H3K27me3 histone modification in differentiated cells during aging not only underlies tissue decline but also mirrors transcriptional changes in cancer, suggesting a common mechanism linking aging and cancer progression.
{"title":"Polycomb misregulation in enterocytes drives tissue decline in the aging Drosophila intestine","authors":"Sarah Leichter, Kami Ahmad, Steve Henikoff","doi":"10.1101/gr.281058.125","DOIUrl":"https://doi.org/10.1101/gr.281058.125","url":null,"abstract":"Aging compromises intestinal integrity, yet the chromatin changes driving this decline remain unclear. Polycomb-mediated repression is essential for silencing developmental genes, but this regulatory mechanism becomes dysregulated with age. Although shifts in Polycomb regulation within intestinal stem cells have been linked to gut aging, the Polycomb landscape of differentiated cell types remains unexplored. Differentiated cells comprise the majority of the gut epithelium and directly impact both tissue and whole organismal aging. Using single-cell chromatin profiling of the <em>Drosophila</em> intestine, we identify cell type-specific chromatin landscape changes during aging. We find that old enterocytes aberrantly repress genes essential for transmembrane transport and chitin metabolism, contributing to intestinal barrier decline – an example of antagonistic pleiotropy in a regenerative tissue. Barrier decline leads to derepression of JAK/STAT ligands in all cell types and increased proliferation of aging stem cells, with elevated RNA Polymerase II (RNAPII) at S-phase-dependent histone genes. Specific upregulation of histone genes during aging stem cell proliferation resembles RNAPII hypertranscription of histone genes in aggressive human cancers. Our work reveals that misregulation of the Polycomb-mediated H3K27me3 histone modification in differentiated cells during aging not only underlies tissue decline but also mirrors transcriptional changes in cancer, suggesting a common mechanism linking aging and cancer progression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"7 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development, with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, whereas the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies, enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of admixed expression from technologies with lower spatial resolution. Here, we introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint nonnegative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from admixed SRT data. In simulations involving paired SRT data sets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell type–specific gene expression and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression.
{"title":"Joint imputation and deconvolution of gene expression across spatial transcriptomics platforms","authors":"Hongyu Zheng, Hirak Sarkar, Benjamin J. Raphael","doi":"10.1101/gr.280555.125","DOIUrl":"https://doi.org/10.1101/gr.280555.125","url":null,"abstract":"Spatially resolved transcriptomics (SRT) technologies measure gene expression across thousands of spatial locations within a tissue slice. Multiple SRT technologies are currently available and others are in active development, with each technology having varying spatial resolution (subcellular, single-cell, or multicellular regions), gene coverage (targeted vs. whole-transcriptome), and sequencing depth per location. For example, the widely used 10x Genomics Visium platform measures whole transcriptomes from multiple-cell-sized spots, whereas the 10x Genomics Xenium platform measures a few hundred genes at subcellular resolution. A number of studies apply multiple SRT technologies to slices that originate from the same biological tissue. Integration of data from different SRT technologies can overcome limitations of the individual technologies, enabling the imputation of expression from unmeasured genes in targeted technologies and/or the deconvolution of admixed expression from technologies with lower spatial resolution. Here, we introduce Spatial Integration for Imputation and Deconvolution (SIID), an algorithm to reconstruct a latent spatial gene expression matrix from a pair of observations from different SRT technologies. SIID leverages a spatial alignment and uses a joint nonnegative factorization model to accurately impute missing gene expression and infer gene expression signatures of cell types from admixed SRT data. In simulations involving paired SRT data sets from different technologies (e.g., Xenium and Visium), SIID shows superior performance in reconstructing spot-to-cell-type assignments, recovering cell type–specific gene expression and imputing missing data compared to contemporary tools. When applied to real-world 10x Xenium-Visium pairs from human breast and colon cancer tissues, SIID achieves highest performance in imputing holdout gene expression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"3 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna
The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.
{"title":"OMKar automates genome karyotyping using optical maps to identify constitutional abnormalities","authors":"Siavash Raeisi Dehkordi, Zhaoyang Jia, Joey Estabrook, Jen Hauenstein, Neil Miller, Naz Güleray-Lafci, Jürgen Neesen, Alex Hastie, Alka Chaubey, Andy Wing Chun Pang, Paul Dremsek, Vineet Bafna","doi":"10.1101/gr.280536.125","DOIUrl":"https://doi.org/10.1101/gr.280536.125","url":null,"abstract":"The whole-genome karyotype refers to the sequence of large chromosomal segments comprising an individual's genotype. Karyotype analysis, which includes identifying aneuploidies and structural rearrangements, is essential for understanding genetic risk factors, informing diagnosis and treatment, and guiding genetic counseling in constitutional disorders. The current karyotyping standard relies on microscopic chromosome examination, a complex and expertise-dependent process with megabase-scale resolution. Optical genome mapping (OGM) technology offers an efficient approach to detect large-scale genomic lesions. Here, we introduce OMKar, a computational method that generates virtual karyotypes from OGM data. OMKar integrates structural variants (SVs) and copy number (CN) variants into a breakpoint graph representation. It re-estimates CNs using integer linear programming to enforce CN balance and then identifies constrained Eulerian paths corresponding to full chromosome structures. OMKar is evaluated on 38 whole-genome simulations of constitutional disorders, achieving 88% precision and 95% recall for SV concordance and a 95% Jaccard score for CN concordance. We further apply OMKar to 154 clinical samples including 50 prenatal, 41 postnatal, and 63 parental genomes collected across 10 sites. It correctly reconstructs the karyotype in 144 cases, including 25 of 25 aneuploidies, 32 of 32 balanced translocations, and 72 of 82 unbalanced rearrangements. Identified disorders include cri-du-chat, Wolf–Hirschhorn, Prader–Willi, Down, and Turner syndromes. Notably, OMKar uncovers plausible genetic mechanisms in five previously unexplained cases. These results demonstrate the accuracy and utility of OMKar for OGM-based constitutional karyotyping.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"11 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}